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A Note About Rounding 


In this solutions manual, rounded values are written in the calculations, but more 
accurate values were used to arrive at the given answers. 


So, for example, a calculation in the solution to Exercise 13.59, Part (a), is shown 
as 


—0,621 


————————— 
f1 - (-0.621)° 
59 


When we calculate the value of f using the numbers shown, we get an answer of 
—6.086, not —6.090. However, the value of —0.621 was actually, more accurately, 
—0.6212889827, and it was this value that was used in the calculation, giving 
—6.090, which is the correct value of f to the nearest one-thousandth. 


= —6.090. 


Michael Allwood 
Brunswick School 
Greenwich, CT 


iV 
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Chapter 1 
The Role of Statistics and the Data Analysis Process 


Descriptive statistics is the branch of statistics that involves the organization and summary of the 
values in a data set. Inferential statistics is the branch of statistics concerned with reaching 
conclusions about a population based on the information provided by a sample. 


The percentages would have been computed from a sample. 


The population of interest is the set of all 15,000 students at the university. The sample is the two 
hundred students who are interviewed. 


The population is the set of all 7000 property owners. The sample is the 500 owners included in 
the survey. 


The population is the set of 5000 used bricks. The sample is the set of 100 bricks she checks. 


a 


The researchers wanted to compare the effectiveness of the new flu vaccine (administered by 
nasal spray) with the effectiveness of the conventional vaccine (administered by injection). 
They were motivated to learn whether the new vaccine significantly reduced the incidence of 
influenza (when compared to a placebo) and whether the incidence of ear infections would be 
reduced in those children who did contract influenza. 


First, it is not stated whether the subjects in the experiment were randomly assigned to the 
treatments; this would be necessary in a well designed experiment. Second, in order to 
compare the effectiveness of the new and old vaccines, it might have been useful to include a 
group of subjects who are given the conventional vaccine (although the results of previous 
studies could possibly be used for this purpose). Third, in order to determine whether the new 
vaccine significantly reduced the incidence of ear infections, a larger number of subjects 
needed to be included in the group of subjects who were given the new vaccine. With just one 
percent of the 1070 subjects contracting influenza (approximately 11 subjects), it is not 
possible to make, with a reasonable degree of confidence, an accurate estimate of the 
proportion of flu contractors who go on to contract the ear infection. 


Categorical 

Categorical 

Numerical (discrete) 
Numerical (continuous) 
Categorical 


Numerical (continuous) 
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1.21 


Continuous 

Continuous 

Continuous 

Discrete 

Gender of purchaser, brand of motorcycle, telephone area code 
Number of previous motorcycles 

Bar chart 


Dotplot 


1.5 2.0 ae 3.0 Sa) 4.0 4.5 5.0 aye 6.0 6.5 7.0 


Cost (cents per gram of protein) 


The costs per gram of protein for the meat and poultry items are represented by squares in the 
dotplot above. With every one of the meat and poultry items included in the lowest seven cost 
per gram values, meat and poultry items appear to be relatively low cost sources of protein. 


Frequency 


20 


15 


a = a 6 a on i) i 
3) a 2 =] o B= = — 
S oO n = > gs 
3 is Q o ° o) 
S = S al oS coal 
(ee. Fal =) op 2 

= = § 

aa a ; 

pone 


Primary Reason for Leaving 


The most common reason was financial, this accounting for 30.2% of students who left for non- 
academic reasons. The next two most common reasons were health and other personal reasons, 


these accounting for 19.0% and 15.9%, respectively, of the students who left for non-academic 
reasons. 
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123 a The dotplot shows that there were two sites that received far greater numbers of visits than 
the remaining 23 sites. Also, it shows that the distribution of the number of visits has the 
greatest density of points for the smaller numbers of visits, with the density decreasing as the 


number of visits increases. This is the case even when only the 23 less popular sites are 
considered. 


b_ Again, it is clear from the dotplot that there were two sites that were used by far greater 
numbers of individuals (unique visitors) than the remaining 23 sites. However, these two sites 
are less far above the others in terms of the number of unique visitors than they are in terms 
of the total number of visits. As with the distribution of the total number of visits, the 
distribution of the number of unique visitors has the greatest density of points for the smaller 
numbers of visitors, with the density decreasing as the number of unique visitors increases. 
This is the case even when only the 23 less popular sites are considered. 


¢ The statistic “visits per unique visitor” tells us how heavily the individuals are using the sites. 
Although the table tells us that the most popular site (Facebook) in terms of the other two 
statistics also has the highest value of this statistic, the dotplot of visits per unique visitor 
shows that no one or two individual sites are far ahead of the rest in this respect. 


T2555 "a 


Wireless % 


b Looking at the dotplot we can see that Eastern states have, on average, lower wireless 
percents than states in the other two regions. The West and Middle states regions have, on 
average, roughly equal wireless percents. 


1.27. a  Whenranking the airlines according to delayed flights, one airline would be ranked above 
another if the probability of a randomly chosen flight being delayed is smaller for the first 
airline than it is for the second airline. These probabilities are estimated using the rate per 
10,000 flights values, and so these are the data that should be used for this ranking. (Note that 
the total number of flights values are not suitable for this ranking. Suppose that one airline 
had a larger number of delayed flights than another airline. It is possible that this could be 
accounted for merely through the first airline having more flights than the second.) 


b_ There are two airlines, ExpressJet and Continental, which, with 4.9 and 4.1 of every 10,000 
flights delayed, stand out as the worst airlines in this regard. There are two further airlines 
that stand out above the rest: Delta and Comair, with rates of 2.8 and 2.7 delayed flights per 
10,000 flights. All the other airlines have rates below 1.6, with the best rating being for 
Southwest, with a rate of only 0.1 delayed flights per 10,000. 
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1.299 a 


Relative Frequency (%) 


Reason 


b The categories “Easy access to junk food,” “Eating unhealthy food,” and “Overeating” could 
be combined, since these categories all concern the child’s eating habits. It could be 
considered a good idea to do this since the other three categories represent very distinct 
causes of the overweight condition, while for many children with poor eating habits the 
choice might be somewhat arbitrary as to which of the three dietary factors should be 
considered the most important. 


1.31 


Single parent 
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Relative Frequency (%) 
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1.35 


Relative Frequency 


Type of Violation 


By far the most frequently occurring violation categories were security (43%) and maintenance 
(39%). The least frequently occurring violation categories were flight operations (6%) and 
hazardous materials (3%). 


EA 


Response 
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Chapter 2 
Collecting Data Sensibly 


2.1 a This is an observational study. 


b No. It is quite possible, for example, that those children who averaged more than two hours of 
television viewing per day received, generally speaking, a less good education than those who 
did not, and that it is the less good education, and not the television viewing, that caused the 
lower reading scores. 


2:3 a This is an observational study. 


b Yes. Since the researchers looked at a random sample of publically accessible MySpace web 
profiles posted by 18-year-olds, it is reasonable to generalize the stated conclusion to all 18- 
year-olds with publically accessible MySpace profiles. 


c No, it is not reasonable to generalize the stated conclusion to all 18-year-old MySpace users 
since no users without publically accessible profiles were included in the study. 


d No, it is not reasonable to generalize the stated conclusion to all MySpace users with 
publically accessible profiles since only 18-year-olds were included in the study. 


2.5 We will refer to students who have a high school GPA of at least 3.5 and a combined SAT score 
of over 1200 as “well qualified” students. It is quite possible that well qualified students who go 
to “most selective” colleges are, on the whole, naturally better motivated than well qualified 
students who go to “least selective” colleges. Therefore, if all the well qualified students who 
were admitted to “least selective” colleges were moved to “most selective” colleges, these 
students would not necessarily achieve the 89% graduation rate achieved by well qualified 
students who were admitted to “most selective” colleges. 


Det We are told that moderate drinkers, as a group, tended to be better educated, wealthier, and more 
active than nondrinkers. It is therefore quite possible that the observed reduction in the risk of 
heart disease amongst moderate drinkers is caused by one of these attributes and not by the 
moderate drinking. 


29 It is not appropriate to make the conclusion stated since it is quite possible that babies born to 
mothers with diabetes differ, in some relevant way other than the experience of pain in early life, 
from babies born to mothers without diabetes. For example, it could be suggested that babies born 
to mothers with diabetes have, due to the exposure during early development to their mothers’ 
blood, a greater susceptibility to stress in general than babies born to mothers without diabetes. 
Thus the grimacing and crying observed amongst these babies when having blood drawn could be 
caused by this greater susceptibility to stress and not by the pain experienced early in life. 


2.11. a The data would need to be collected from a simple random sample of affluent Americans. 


b No. Since the survey included only affluent Americans the result cannot be generalized to all 
Americans. 
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2.13 Method 1: Using a computer list of the graduates, number the graduates 1-140. Use a random 
number generator on a calculator or computer to randomly select a whole number between | and 
140. The number selected represents the first graduate to be included in the sample. Repeat the 
number selection, ignoring repeated numbers, until 20 graduates have been selected. 

Method 2: Using a computer list of the graduates, number the graduates 001-140. Take the first 
three digits from the left hand end of a row from a table of random digits. If the three-digit 
number formed is between 001 and 140 inclusive, the graduate with that number should be the 
first graduate in the sample. If the number formed is not between 001 and 140 inclusive, the 
number should be ignored. Repeat the process described for the next three digits in the random 
number table, and continue in the same way until 20 graduates have been selected. (Three-digit 
numbers that are repeats of numbers previously selected should be ignored.) 


2.15 Using a computer list of the cases, number the cases 1-870. Use a random number generator on a 
calculator or computer to randomly select a whole number between | and 140. The number 
selected represents the first case to be included in the sample. Repeat the number selection, 
ignoring repeated numbers, until 50 cases have been selected. 


2.17 The method used by researcher B is preferable. It is quite possible that the rows will differ in 
terms of the sugar content of fruit from the trees. In the method used by researcher A the sample 
obtained will necessarily include trees from exactly six rows. However, in the method used by 
researcher B the sample will very likely include trees from a much greater number of rows, and is 
therefore more likely to be representative of the population of trees. 


2.19 a Using the list, first number the part time students 1-3000. Use a random number generator on 
a calculator or computer to randomly select a whole number between | and 3000. The 
number selected represents the first part time student to be included in the sample. Repeat the 
number selection, ignoring repeated numbers, until 10 part time students have been selected. 
Then number the full time students 1-3500 and select 10 full time students using the same 
procedure. 


b_ No. With 10 part time students being selected out of a total of 3000 part time students, the 
probability of any particular part time student being selected is 10/3000 = 1/300. Applying a 
similar argument to the full time students, the probability of any particular full time student 
being selected is 10/3500 = 1/350. Since these probabilities are different, it is not the case that 
every student has the same chance of being included in the sample. 


2.21. a_ The pages of the book have already been numbered between | and the highest page number 
in the book. Use a random number generator on a calculator or computer to randomly select a 
whole number between | and the highest page number in the book. The number selected will 
be the first page to be included in the sample. Repeat the number selection, ignoring repeated 
numbers, until the required number of pages has been selected. 


b_ Pages that include exercises tend to contain more words than pages that do not include 
exercises. Therefore, it would be sensible to stratify according to this criterion. Assuming that 
20 non-exercise pages and 20 exercise pages will be included in the sample, the sample should 
be selected as follows. Use a random number generator to randomly select a whole number 
between | and the highest page number in the book. The number selected will be the first page 
to be included in the sample. Repeat the number selection, ignoring repeated numbers and 
keeping track of the number of pages of each type selected, until 20 pages of one type have 
been selected. Then continue in the same way, but ignore numbers corresponding to pages of 
that type. When 20 pages of the other type have been selected, stop the process. 
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¢ Randomly select one page from the first 20 pages in the book. Include in your sample that 
page and every 20th page from that page onwards. 


d Roughly speaking, in terms of the numbers of words per page, each chapter is representative 
of the book as a whole. It is therefore sensible for the chapters to be used as clusters. Using a 
random number generator randomly choose three chapters. Then count the number of words 
on each page in those three chapters. 


e Answers will vary. 
f Answers will vary. 


2.23 The researchers should be concerned about nonresponse bias. Only a small proportion (20.7%) of 
the selected households completed the interview, and it is quite possible that those households 
who did complete the interview are different in some relevant way concerning Internet use from 
those who did not. 


2.25 First, the participants in the study were all students in an upper-division communications course 
at one particular university. It is not reasonable to consider these students to be representative of 
all students with regard to their truthfulness in the various forms of communication. Second, the 
students knew during the week’s activity that they were surveying themselves as to the 
truthfulness of their interactions. This could easily have changed their behavior in particular 
social contexts and therefore could have distorted the results of the study. 


2.27 First, the people who responded to the print and online advertisements might be different in some 
way relevant to the study from the population of people who have online dating profiles. Second, 
only the Village Voice and Craigslist New York City were used for the recruitment. It is quite 
possible that people who read that newspaper or access those websites differ from the population 
in some relevant way, particularly considering that they are both New York City based 
publications. 


2.29 The individuals within each stratum should on the whole be similar in terms of the topic of the 
study. This is true of the proposed strata in Scheme 2, since it is likely that college students will 
on the whole be similar in their opinions regarding the possible tax increase; likewise nonstudents 
who work full time will on the whole be similar in their opinions regarding the possible tax 
increase, and nonstudents who do not work full time will on the whole be similar in their opinions 
regarding the possible tax increase. Scheme 1, however, is not suitable since we have no reason to 
believe that people within the proposed first-letter-of-last-name strata will be similar in terms of 
their attitudes to the possible tax increase. Similarly the suggested stratification in Scheme 3 is 
very unlikely to produce homogeneous groups. 


2.31. Different subsets of the population might have responded by different methods. For example, it is 
quite possible that younger people (who might generally be in favor of continuing the parade) 
chose to respond via the Internet while older people (who might on the whole be against the 
parade) chose to use the telephone to make their responses. 
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en EE EE ee 
2.33 a Binding strength 
b_ Type of glue 


c The extraneous variables mentioned are the number of pages in the book and whether the 
book is bound as a hardback or a paperback. Further extraneous variables that might be 
considered include the weight of the material used for the cover and the type of paper used. 


2.35 Random assignment should have been used to determine, for each cyclist, which drink would be 
consumed during which break. 


2.37 Werely on random assignment to produce comparable experimental groups. If the researchers 
had hand-picked the treatment groups, they might unconsciously have favored one group over the 
other in terms of some variable that affects the subjects’ ability to deal with multiple inputs. 


2.39 a Ifthe participants had been able to choose their own avatars, then it is quite possible, for 
example, that people with a lot of self confidence would tend to choose the attractive avatar 
while those with less self confidence would tend to choose the unattractive avatar. Then, if 
the same result was obtained as the one described in the paper, it would be impossible to tell 
whether the greater closeness achieved by those with the attractive avatar came about as a 
result of the avatar or as a result of those people’s greater self confidence. 


Attractive Measure 
avatar closeness 


Compare closeness 
for attractive avatar 
vs. unattractive avatar 


Participants 


JUSWIUSISSY WOpUeYy 


Measure 
closeness 


Unattractive 
avatar 


2.41. We rely on random assignment to produce comparable experimental groups. If the researchers 
had hand-picked the treatment groups, they might unconsciously have favored one group over the 
other in terms of some variable that affects the subjects’ ability to learn through video gaming 
activity. 
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2.43 


Measure 
distance 
traveled 


Additive 1 


Compare distances 
traveled for the 
three additives 


Measure 
distance 
traveled 


30 trials Additive 2 


JUDSUIUSISSY WopuRYy 


Measure 
distance 
traveled 


Additive 3 


2.45 a The improvement in group 3 compared to group | cannot be attributed to the use of Sweet 
Talk since group 3 differs from group | in two respects: the incorporation of Sweet Talk and 
the use of the new intensive insulin therapy in place of the conventional insulin therapy. 
Therefore it is not possible to tell whether the improvement is attributable to Sweet Talk, the 
intensive insulin therapy, or a combination of the two. (Note that the fact that there is no 
significant difference in the results for groups | and 2 suggests that Sweet Talk is not 
beneficial when used in conjunction with the conventional insulin treatment. It does not tell 
us whether Sweet Talk would be helpful when the intensive insulin treatment is being used.) 


b The experiment needs to be modified by the addition of a group (group 4) that receives the 
intensive insulin therapy without Sweet Talk support. Then a comparison between the results 
of groups 3 and 4 will tell the experimenters whether Sweet Talk presents an improvement 
when the intensive insulin therapy is being used. 
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Cc 
Conventional Measure 
Insulin, no glucose 
Sweet Talk concentration 
Participants Compare glucose 


concentration for 


the four treatments 
Measure 


glucose 
concentration 


Insulin, no 
Sweet Talk 


JUDUIUSISSY WOpUPY 


Intensive Measure 
Insulin with 


Sweet Talk 


glucose 
concentration 


Conventional Measure 

Insulin with glucose 

Sweet Talk concentration 
i Intensive f i 


2.47 a Red wine, yellow onions, black tea 
b Absorption of flavonol into the blood 


¢ Gender, amount of flavonols consumed apart from experimental treatment, tolerance of 
alcohol in wine 


2.49 “Blinding” is ensuring that the experimental subjects do not know which treatment they were 
given and/or ensuring that the people who measure the response variable do not know who was 
given which treatment. When this is possible to implement, it is useful that the subjects do not 
know which treatments they were given since, if a person knows what treatment he/she was 
given, this knowledge could influence the person’s perception of the response variable, or even, 
through psychological processes, have a direct effect on the response variable. If the response 
variable is to be measured by a person other than the experimental subjects it is useful if this 
person doesn’t know who received which treatment since, if this person does know who received 
which treatment, then this could influence the person’s perception of the response variable. 


2.51 a Inorder to know that the results of this experiment are valid it is necessary to know that the 
assignment of the women to the groups was done randomly. For suppose, for example, that 
the women were allowed to choose which groups they went into. Then it would be quite 
possible, for instance, that women who are particularly social by nature, and therefore whose 
health would be enhanced by any regular social gathering, would choose the more interesting 
sounding art discussions, while those less social by nature (and therefore less likely to be 
helped by social gatherings) would choose the more conventional discussions of hobbies and 
interests. Then it would be impossible to tell whether the stated results were caused by the 
discussions of art or by the greater social nature of the women in the art discussion group. 


b Suppose that all the women took part in weekly discussions of art, and that generally an 
improvement in the medical conditions mentioned was observed amongst the subjects. Then 
it would be impossible to tell whether these health improvements had been caused by the 
discussions of art or by some factor that was affecting all the subjects, such as an 


© 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. 


Chapter 2: Collecting Data Sensibly 13 
ee 


improvement in the weather over the four months. By including a control group, and by 
observing that the improvements did not take place (generally speaking) for those in the 
control group, factors such as this can be discounted, and the discussions of art are 
established as the cause of the improvements. 


2.53 We will assume that only four colors will be compared, and that only headache sufferers will be 
included in the study. 


Prepare a supply of “Regular Strength” Tylenol in four different colors: white (the current color 
of the medication, and therefore the “control”), red, green, and blue. Recruit 20 volunteers who 
suffer from headaches. Instruct each volunteer not to take any pain relief medication for a week. 
After that week is over, issue each volunteer a supply of all four colors. Give each volunteer an 
order in which to use the colors (this order would be determined randomly for each volunteer). 
Instruct the volunteers to use one fixed dose of the medication for each headache over a period of 
four weeks, and to note on a form the color used and the pain relief achieved (on a scale of 0-10, 
where 0 is no pain relief and 10 is complete pain relief). At the end of the four weeks gather the 
results and compare the pain relief achieved by the four colors. 


2.55 Suppose that the dog handlers and/or the experimental observers had known which patients did 
and did not have cancer. It would then be possible for some sort of (conscious or unconscious) 
communication to take place between these people and the dogs so that the dogs would pick up 
the conditions of the patients from these people rather than through their perception of the 
patients’ breath. By making sure that the dog handlers and the experimental observers do not 
know who has the disease and who does not it is ensured that the dogs are getting the information 
from the patients. 


2.57 a Ifthe judges had known which chowder came from which restaurant then it is unlikely that 
Denny’s chowder would have won the contest, since the judges would probably be 
conditioned by this knowledge to choose chowders from more expensive restaurants. 


b In experiments, if the people measuring the response are not blinded they will often be 
conditioned to see different responses to some treatments over other treatments, in the same 
way as the judges would have been conditioned to favor the expensive restaurant chowders. It 
is therefore necessary that the people measuring the response should not know which subject 
received which treatment, so that the treatments can be compared on their own merits. 


2.59 a A placebo group would be necessary if the mere thought of having amalgam fillings could 
produce kidney disorders. However, since the experimental subjects were sheep the 
researchers do not need to be concerned that this would happen. 


b A resin filling treatment group would be necessary in order to provide evidence that it is the 
material in the amalgam fillings, rather than the process of filling the teeth, or just the 
presence of foreign bodies in the teeth, that is the cause of the kidney disorders. If the 
amalgam filling group developed the kidney disorders and the resin filling group did not, then 
this would provide evidence that it is some ingredient in the amalgam fillings that is causing 
the kidney problems. 


e Since there is concern about the effect of amalgam fillings it would be considered unethical to 
use humans in the experiment. 
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2.67 


2.69 


2.71 


2.13 


2.75 


2.77 
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Answers will vary. 

Answers will vary. 

Answers will vary. 

a This is an observational study. 

b In order to evaluate the study, we need to know whether the sample was a random sample. 


¢ No. Since the sample used in the Healthy Steps study was known to be nationally 
representative, and since the paper states that, compared with the HS trial, parents in the 
study sample were disproportionately older, white, more educated, and married, it is clear that 
it is not reasonable to regard the sample as representative of parents of all children at age 5.5 
years. 


d_ The potential confounding variable mentioned is what the children watched. 


e The quotation from Kamila Mistry makes a statement about cause and effect and therefore is 
inconsistent with the statement that the study can’t show that TV was the cause of later 
problems. 


Answers will vary. 


The first criticism describes measurement bias. Asking people whether they are talking less to 
family and friends on the phone could be a biased measure of increased social isolation. First, 
people might be reluctant to give truthful answers to this question, and second, the question 
ignores face-to-face contact with family and friends. It is possible, for example, that face-to-face 
interaction might be increasing while phone contact is decreasing. The second criticism describes 
selection bias. Since this survey about Internet use was based on a group of people who were 
induced to participate by the offer of free Internet service, it is not reasonable generalize the 
results to all US adults. 


We rely on random assignment to produce comparable experimental groups. If the researchers 
had hand-picked the treatment groups, they might unconsciously have favored one group over the 
other in terms of some variable that affects the ability of the people at the centers to respond to 
the materials provided. 


a Observational study 


b It is quite possible that the children who watched large amounts of TV in their early years 
were also those, generally speaking, who received less attention from their parents, and it was 
the lack of attention from their parents that caused the later attention problems, not the TV- 
watching. 


It is possible, for example, that people who are not married are more likely to go out alone 
(except for the widowed, who are older and therefore tend to stay home). It could then be this 
going out alone that is causing the risk of being a victim of violent crime, not the marital status. 


All the participants were women, from Texas, and volunteers. All three of these facts tell us that it 
is likely to be unreasonable to generalize the results of the study to all college students. 
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2.81 a The extraneous variables identified are gender, age, weight, lean body mass, and capacity to 
lift weights. They were dealt with by direct control: all the volunteers were male, about the 
same age, and similar in weight, lean body mass, and capacity to lift weights. 


b_ Yes, it is important that the men were not told which treatment they were receiving, otherwise 
the effect of giving a placebo would have been removed. If the participants were told which 
treatment they were receiving, then those taking the creatine would have the additional effect 
of the mere taking of a supplement thought to be helpful (the placebo effect) and those 
getting the fake preparation would not get this effect. It would then be impossible to 
distinguish the influence of the placebo effect from the effect of the creatine itself. 


¢ Yes, it would have been useful if those measuring the increase in muscle mass had not know 
who received which treatment. It is possible that, through having this knowledge, the people 
would have been unconsciously influenced into exaggerating the increase in muscle mass for 
those who took the creatine. 


2.83 The design could be completely randomized or could involve blocking. The following is a 
completely randomized design. 


Divide the plot into a 4 by 4 grid consisting of 16 equally sized square subplots. Number the 
subplots 1-16. Use a random number generator to select integers between | and 16 inclusive. 
Ignoring repeats, the subplots represented by the first four integers will receive undisturbed native 
grasses. The subplots represented by the following four integers will receive managed native 
grasses. The subplots represented by the following four integers will receive undisturbed 
nonnative grasses. The remaining four subplots will receive managed nonnative grasses. 


Some possible confounding variables are the amount of light a subplot receives, the amount of 
moisture in a subplot, and whether or not a subplot is on the boundary of the grid. (One of these 
variables, amount of light, for example, will actually be a confounding variable if one particular 
type of grass is assigned to subplots with more light than the other types of grass.) 


(A design using blocking would need to include blocks consisting of subplots that are similar in 
terms of one or more possible confounding variables. The subplots within each block would be 
randomly assigned to the four grasses.) 


This is an experiment, since the treatments (the different types of grass) are imposed on the 
subplots, rather than using areas of land that already have the types of grass mentioned. 


2.85 There are many possible designs. We will describe here a design that blocks for the day of the 
week and the section of the newspaper in which the advertisement appears. For the sake of 
argument we will assume that the mortgage lender is interested in advertising on only two days of 
the week (Monday and Tuesday) and that there are three sections in the newspaper (A, B, and C). 
We will refer to the three types of advertisement as Ad 1, Ad 1, and Ad 3. 


The experimental units are 18 issues of the newspaper (that is, 18 dates) consisting of Mondays 
and Tuesdays over 9 weeks. Use a random process to decide which three Mondays will receive 
advertisements in Section A, which three Mondays will receive advertisements in Section B, and 
which three Mondays will receive advertisements in Section C. Do the same for the nine 
Tuesdays. We have now effectively split the 18 issues into the six blocks shown below. (There 
are 3 issues in each block.) 
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Mon, Sect A Mon, Sect B Mon, Sect C 
Tue; Sect A Tue, Sect B . Tue, SectC 


Now randomly assign the three issues in each block to the three advertisements. (Ad 1 is then 
appearing on three Mondays, once in each section, and on three Tuesdays, once in each section. 
The same applies to Ad 2 and Ad 3.) The response levels for the three advertisements can now be 
compared (as can the three different sections and the two different days). 
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SF) a The second and third categories (“Permitted for business purposes only” and “Permitted for 
limited personal use” were combined into one category (“No, but some limits apply”). 


Response Category 
i Don’t know 
| Permitted for personal use 
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BS Limited personal use 
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Responses 


ce Pie chart, regular bar graph 
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3.5 
Rel. Frequ. (Percent) 
Explorer Visible Status Quo Non-Teen Isolator 
Group 
Since the number of categories is relatively high, a bar graph is suitable. 
a a 


Percentage 


Japan France UK US Canada 
Country 


b_ Were the surveys carried out on random samples of married women from those countries? 
How were the questions worded? 


¢c Inone country, Japan, the percentage of women who say they never get help from their 
husbands is far higher than the percentages in any of the other four countries included. The 
percentages in the other four countries are similar, with Canada showing the lowest 
percentage of women who say they do not get help from their husbands. 
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1.0 SEN aEeEee Eq Never 
A few times a month or less 
VA A few or more tim 
es per week 
08 ees : 
0.6 
0.4 
02 
0.0 
Response 
Jil 


Percent Unfit 


Male Female Male Female 
Adolescent Adult 


b The comparative bar graph shows that a much higher proportion of adolescents are unfit than 
adults. It also shows that while amongst adolescents the rates of unfitness are roughly the 
same for females and males, amongst adults the rate is significantly higher for females than it 
is for males. 


3.13. a  No.A pie chart is unsuitable when there is such a large number of categories. 
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Rel. Frequ. (Percent) 


Yes, it is easier to see the differences between the relative frequencies for the different 
hazards, particularly for those with small relative frequencies. 


3.15 
10 | 578 
11 | 79 
12 | 1114 
13 | 001122478899 
14 | 0011112235669 
1S | 11122445599 
16s 1227 
17 jel 
18 
19 Stem: Ones 
20 | 8 Leaf: Tenths 
A typical number of births per thousand of the population is around 14, with most birth rates 
concentrated in the 13.0 to 15.9 range. The distribution has just one peak (at the 14-15 class). 
There is an extreme value, 20.8, at the high end of the data set, and this is the only birth rate 
above 17.1. The distribution is not symmetrical, since it has a greater spread to the right of its 
center than to the left. 
Lee 


OH | 55567889999 

IL | 0000111113334 

1H | 556666666667789 

2L | 00001122233 Stem: Tens 
Pla |S Leaf: Ones 


A typical percentage of households with only a wireless phone is around 15. 


© 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. 


Chapter 3: Graphical Methods for Describing Data 


West East 


998 | OH | 555789 
110 | IL | 00011134 


8766 | 1H | 666 
DY | Ae Oe Stem: Tens 
5) || Asi Leaf: Ones 


A typical percentage of households with only a wireless phone for the West is around 16, 


which is greater than for the East (around 11). There is a slightly greater spread of values in 


the West than in the East, with values in the West ranging from 8 to 25 (a range of 17) and 
values in the East ranging from 5 to 20 (a range of 15). The distribution for the West is 

roughly symmetrical, while the distribution in the East shows a slightly greater spread to the 
right of its center than to the left. Neither distribution has any outliers. 


S19 a 
a alerOl 


—0 | 99998888776555555444433222211110 
0 | 000011244577 


I Peas) 
2 


Stem: Tens 
Leaf: Ones 


b Split each stem into two, one taking the lower leaves (0-4) and the other taking the higher 
leaves (5-9). So, for example, the stem “0” would be split into “OL” and “OH”, with OL taking 
the leaves “000011244” and OH taking the leaves “44577”. 


21 


ce The three states with the greatest percentage increase in the number of 25- to 44-year-olds are 
Nevada, Utah, and Arizona, all desert states. 
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333 

44444455555 

666666666677777777777 

88888888999 Stem: Tens 
0000 Leaf: Ones 


The stem-and-leaf display shows that the distribution of high school dropout rates is roughly 
symmetrical. A typical dropout rate is 7%. The great majority of rates are between 4% and 9%, 


inclusive. 
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¢ The dotplot is more informative as it shows where the data points actually lie. For example, 
in the histogram we can tell that there are 3 observations in the 20 to 25 interval, but we don’t 
see the actual values and miss the fact that these values are actually considerably higher than 
_ the other values in the data set. 


Frequency 


Aes S40) sy KOO PAs) SO) ie OW BPA) PSO) 
Percentage of Workers who Belong to a Union 


The histogram in part (a) could be taken to imply that there are states with a percent of 
workers belonging to a union near zero. It is clear from this second histogram that this is not 
the case. Also, the second histogram shows that there is a gap at the high end and that the 
three largest values are noticeably higher than those of the other states. This fact is not clear 
from the histogram in part (a). 
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Credit Card Balance (Credit Bureau Data) 
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Credit Card Balance (Survey Data) 


ce The histograms are very similar, except that the Credit Bureau results show 7% of students 
having a debt of $7000 or more, whereas in the survey no student admitted to having a debt 
this size. 


d_ Yes. It is quite possible that the students who did not respond included those with a debt of 
over $7000, particularly as students with such a large debt would probably not want to admit 
it. 


3.29 a First, the class intervals do not all have the same width, and so use of relative frequency on 
the y-axis would not be appropriate. Second, we are not given an upper boundary for the last 
class interval, so we don’t have enough information to draw the histogram. 
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a ee re 


c By far the highest density of educational debts occurs in the $0-5000 range, with 43% of the 
students having debts in this relatively narrow interval. Amongst the remaining 57% of 
students there seems to be a roughly symmetrical distribution of debts, with the greatest 

- density of debts occurring in the $50,000-100,000 range. 


3.31 


Density 


3.33 Answers will vary. 


3:35 .a 


28 
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Relative frequency (percent) 


0 2 4 6 8 10 12 14 16 
Years survived 


c The histogram shows a bimodal distribution, with peaks at the 2-4 year and 14-16 year 
intervals. All the other survival times were considerably less common than these two. 


d= We would need to know that the set of patients used in the study formed a random sample of 
all patients younger than 50 years old who had been diagnosed with the disease and had 


received the high dose chemotherapy. 


3.37. Answers will vary. One possibility for each part is shown below. 
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The graph shows an upward trend in the percentage of homes with only a wireless phone service 
from June 2005 to December 2008. The increase has been at a roughly steady rate, with only the 
periods June to December 2005 and December 2006 to June 2007 showing a slightly lower rate of 
growth. 


a 


Rating 


20 30 40 50 60 70 80 90 100 110 
Cost 


There is a weak relationship between cost and quality rating, with higher costs being loosely 
associated with lower ratings. 
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Type 
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Rating 


20 30 38640 50 60 70 80 S09 00m ITO 
Cost 


The range of costs for men’s athletic shoes is slightly greater than for women’s (with just one 
type of men’s shoe providing a cheaper option). For any given cost, there is generally 
speaking a greater spread of ratings for men’s shoes than for women’s, with the women’s 
shoes tending to show slightly higher ratings than the men’s. For women’s shoes the 
relationship between cost and quality rating is very weak. For men’s shoes the relationship is 
stronger for the women’s (and stronger than for the combined data set). 

3.43 


60 
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Recycled Waste (millions of tons) 


0 
1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 
Year 


The plot shows that the amount of waste collected for recycling had grown substantially (not 
slowly, as is stated in the article) in the years 1990 to 2005. The amount increased from under 30 


million tons to nearly sixty million tons in that period, which means that the amount had almost 
doubled. 
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3.45 


3.47 


3.49 


3.51 


According to the 2001 and 2002 data, there are seasonal peaks at weeks 4, 9, and 14, and seasonal 
lows at weeks 2, 6, 10-12, and 18. 


a. 


A Unknown/Other 
Native American 
E:] African American 
By Hispanic/Latino 
Asian American 


White 


0.8 


0.6 


0.4 


0.2 


0.0 


Enrollment 


b_ The graphical display created in Part (a) is more informative, since it gives an accurate 
representation of the proportions of the ethnic groups. 


c The people who designed the original display possibly felt that the four ethnic groups shown 
in the segmented bar section might seem to be underrepresented at the college if they used a 
single pie chart. 


The first graphical display is not drawn appropriately. The Z’s have been drawn so that their 
heights are in proportion to the percentages shown. However, the widths and the perceived depths 
are also in proportion to the percentages, and so neither the areas nor the perceived volumes of 
the Z’s are proportional to the percentages. The graph is therefore misleading to the reader. In the 
second graphical display, however, only the heights of the cars are in proportion to the 
percentages shown. The widths of the cars are all equal. Therefore the areas of the cars are in 
proportion to the percentages, and this is an appropriately drawn graphical display. 


The piles of cocaine have been drawn so that their heights are in proportion to the percentages 
shown. However, the widths are also in proportion to the percentages, and therefore neither the 
areas (nor the perceived volumes) are in proportion to the percentages. The graph is therefore 
misleading to the reader. 
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3:53 
Average Verbal SAT 
, English English and another Language other than English 
First Language 
3.55 
1/9 
2123788999 
3 | 0011112233459 Stem: Tens 
4 | 0123 Leaf: Ones 
A typical calorie content for these light beers is 31 calories per 100 ml, with the great majority 
lying in the 22-39 range. The distribution is negatively skewed, with one peak (in the 30-39 
range). There are no gaps in the data. 
J200 Meee 
0 | 0033344555568888888999999 
1 | 0001223344567 
2 | 001123689 
3 |0 
4/0 
5 Stem: Tens 
6 | 6 Leaf: Ones 


b_ A typical percentage population increase is around 10, with the great majority of states in the 
0-29 range. The distribution is positively skewed, with one peak (in the 0-9 range). There are 
two states showing significantly greater increases than the other 48 states: one at 40 (Arizona) 
and one at 66 (Nevada). 
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On average, the percentage population increases in the West were greater than those for the 
East, with a typical value for the West being around 14 and a typical value for the East being 
around 9. There is a far greater spread in the values in the West, with values ranging from 0 
to 66, than in the East where values ranged from 0 to 26. Both distributions are positively 
skewed, with a single peak for the East data, and two peaks for the West. In the West there 
are two states showing significantly greater increases than the remaining states, with values at 
40 and 60. There are no such extreme values in the East. 


3.59 a High graft weight ratios are clearly associated with low body weights (and vice versa), and 
the relationship is not linear. (In fact there seems to be, roughly speaking, an inverse 
proportionality between the two variables, apart from a small increase in the graft weight 
ratios for increasing body weights amongst those recipients with the greater body weights. 
This is interesting in that an inverse proportionality between the variables would imply that 
the actual weights of transplanted livers are chosen independently of the recipients’ body 
weights. ) 


b_ A likely reason for the negative relationship is that the livers to be transplanted are probably 
chosen according to whatever happens to be available at the time. Therefore, lighter patients 
are likely to receive livers that are too large and heavier patients are likely to receive livers 
that are too small. 
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b Continuing the growth trend, we estimate that the average home size in 2010 will be 
approximately 2500 square feet. 
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On average, the total tobacco exposure times for the Disney movies are higher than the others, 
with a typical value for Disney being around 90 seconds and a typical value for the other 
companies being around 50 seconds. Both distributions have one peak and are positively skewed. 
There is one extreme value (548) in the Disney data, and no extreme value in the data for the 
other companies. There is a greater spread in the Disney data, with values ranging from 6 seconds 
to 540 seconds, than for the other companies, where the values range from 1 second to 205 
seconds. 
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c The segmented bar graph is slightly preferable in that it is a little easier than in the pie chart 
to see that the proportion of children responding “Most of the time” was slightly higher than 
the proportion responding “Some of the time.” 
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The peaks were probably caused by the incidence of major hurricanes in those years. 
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In every year the number of related donors was much greater than the number of unrelated 
donors. In both categories the number of transplants increased every year, but proportionately 
the increases in unrelated donors were greater than the increases in related donors. 
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b_ The histogram is centered at approximately 0.34, with values ranging from 0.15 to 0.5, plus 
one extreme value in the 0.55-0.6 range. The distribution has a single peak and is slightly 
positively skewed. 
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Cumulative Review Exercises 


CR3.1 No. It is quite possible, for example, that men who ate a high proportion of cruciferous vegetables 
generally speaking also had healthier lifestyles than those who did not, and that it was the 
healthier lifestyles that were causing the lower incidence of prostate cancer, not the eating of 
cruciferous vegetables. 


CR3.3 Very often those who choose to respond generally have a different opinion on the subject of the 
study from those who do not respond. (In particular, those who respond often have strong feelings 
against the status quo.) This can lead to results that are not representative of the population that is 
being studied. 


CR3.5 Only a small proportion (around 11%) of the doctors responded, and it is quite possible that those 
who did respond had different opinions regarding managed care from the majority who did not. 
Therefore the results could have been very inaccurate for the population of doctors in California. 


CR3.7 Suppose, for example, the women had been allowed to choose whether or not they participated in 
the program. Then it is quite possible that generally speaking those women with more social 
awareness would have chosen to participate, and those with less social awareness would have 
chosen not to. Then it would be impossible to tell whether the stated results came about as a result 
of the program or of the greater social awareness amongst the women who participated. By 
randomly assigning the women to participate or not, comparable groups of women would have 
been obtained. 


CR3.9 a 


Pass Rate (%) 


District 
San Luis Obispo High School 
San Luis Obispo County 

£9 State of California 
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b Between 2002 and 2003 and between 2003 and 2004 the pass rates rose for both the high 
school and the state, with a particularly sharp rise between 2003 and 2004 for the state. 
However, for the county the pass rate fell between 2002 and 2003 and then rose between 
2003 and 2004. 
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CR3.11 
a 
0 | 123334555599 
1 | 00122234688 
2 | 0112344497 
3, 10113338 
A357 Stem: Thousands 
SuE2 oT 0.8 Leaf: Hundreds 
The stem-and-leaf display shows a positively skewed distribution with a single peak. There 
are no extreme values. A typical total length is around 2100 and the great majority of total 
lengths lie in the 100 to 3800 range. 
b 
Frequency 
Li) 
0 1000 2000 3000 4000 5000 6000 
Total Length 
c The number of subdivisions that have total lengths less than 2000 is 12 + 11 = 23, and so the 
proportion of subdivisions that have total lengths less than 2000 is 23/47 = 0.489. 
The number of subdivisions that have total lengths between 2000 and 4000 is 10 + 7 = 17, 
and so the proportion of subdivisions that have total lengths between 2000 and 4000 is 17/47 
= (0.361. 
CR3.13 


The histogram shows a smooth positively skewed distribution with a single peak. A typical time 
difference between the two phases of the race is 150 seconds, with the majority of time 
differences lying between 50 and 350 seconds. There are around three values that could be 
considered extreme, with those values lying in the 650 to 750 range. Estimating the frequencies 
from the histogram we see that approximately 920 runners were included in the study and that 
approximately 8 of those runners ran the late distance more quickly than the early distance 
(indicated by a negative time difference). Therefore the proportion of runners who ran the late 
distance more quickly than the early distance is approximately 8/920 = 0.009. 
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CR3.15 
There is a strong negative linear relationship between racket resonance frequency and sum of 
peak-to-peak accelerations. There are two rackets whose data points are separated from the 
remaining data points. Those two rackets have very high resonance frequencies and their peak-to- 
peak accelerations are lower than those of all the other rackets. 
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¥ = (1480 +1071+ 229141688 +1 124 +3476 +3701)/7 = $2118.71. 


To calculate the median, we first list the data values in order of size: 

1071 1124 1480 1688 2291 3476 3701 
The median is the middle value in this list, $1688. 
The mean is much larger than the median since the distribution of these seven values is positively 
skewed. The two largest values are greatly separated from the remaining five values. The median 
is better as a description of a typical value since it is not influenced by the two extreme values. 


The mean caffeine concentration for the brands of coffee listed is 


140 +195 +155+115 +195 +180+110+1104+130+55+60+60 


= 125.417 me/cup. 
12 g/cup 


Therefore the mean caffeine concentration of the coffee brands in mg/oz is (125.417)/8 = 15.677. 
This is significantly greater than the mean caffeine concentration of the energy drinks given in the 
previous exercise. 


The fact that the mean is so much greater than the median tells us that a small number of 
individuals who donate a large amount of time are greatly increasing the mean. 


a There are some unusually large circulation values that make the mean greater than the 
median. 


b The sum of the circulation values given is 13666304, and so the mean is 
(13666304)/20 = 683315.2. 


The values are already given in descending order, and so to find the median we only need to 
find the average of the two middle values: 
(438722 + 427771)/2 = 433246.5. 


¢ The median is does the better job of describing a typical value as it is not affected by the 
small number of unusually large values in the data set. 


d_ This sample is in no way representative of the population of daily newspapers in the US since 
it consists of the top 20 newspapers in the country. 


a The sum of the values given is 8966, and so the mean is 8966/20 = 448.3. 


b Median = (446 + 446)/2 = 446. 


39 
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4.11 


4.13 


4.15 


4.17 


4.19 


c This sample represents the 20 days with the highest number of speeding-related fatalities, and 
so it is not reasonable to generalize from this sample to the other 365 days of the year. 


Neither statement is correct. Regarding the first statement it should be noted that, unless the 
“fairly expensive houses” constitute a majority of the houses selling, these more costly houses 
will not have an effect on the median. Turning to the second statement, we point out that the 
small number of very high or very low prices will have no effect on the median, whatever the 
number of sales. Both statements can be corrected by replacing the median with the mean. 


The two possible solutions are x, =32 and x, = 39.5. 


The two measures of center that can be calculated are the median and the trimmed mean. 
To find the median we first list the data values in order: 

170 290 350 480 570 790 860 920 1000+ 1000+ 
The median is the mean of the two middle values: (570 + 790)/2 = 680 hours. 


The 20% trimmed mean is (350 + 480 + 570 + 790 + 860 + 920)/6 = 661.667 hours. 


a xX =(29+62+374+41+704+82+47+52+49)/9 =52.111. 
(29 —52.111)° +-+-+(49-52.111)° 
8 


Variance = =k} a WL 


s =V¥279.111 =16.707. 


b_ The addition of the very expensive cheese would increase the values of both the mean and the 
standard deviation. 


a The complete data set, listed in order, is: 
19 28 30 4] 43 46 48 49 53 mis, 54 
62 67 71 (i) 
Lower quartile = 4th value = 41. Upper quartile = 12th value = 62. Iqr = 21. 


b_ The iqr for cereals rated good (calculated in exercise 4.18) is 24. This is greater than the value 
calculated in Part (a). 


a x =(1480+---+3701)/7 =2118.71429 
Variance = ((1480 — 2118.71429) +--- + (3701 —2118.71429) )/6= 1176027.905 . 


Standard deviation = /1176027.905 = 1084.448. 


The fairly large value of the standard deviation tells us that there is considerable variation 
between the repair costs. 


b- For minivans, mean = 1355.833, variance = 93698.967, and standard deviation = 306.103. 
The mean repair cost for minivans is less than for the smaller cars, showing a lower typical 
repair cost for the minivans. The standard deviation for minivans is considerably less than for 
the smaller cars, showing a lower repair cost variability for the minivans. 
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4.23, a _ The data values, listed in order, are: 


0 0 0 0 0 0 0 oy 71 83 106 
130 142 142 165 ae 189 189 1895 97201 odd 224 
2305236" #306 


Lower quartile = average of 6th and 7th values = (0 + 0)/2 = 0. 
Upper quartile = average of 19th and 20th values = (189 + 201)/2 = 195. 
Interquartile range = 195 — 0 = 195. 


b_ The lower quartile is equal to the minimum value for this data set because there are a large 
number of equal values (zero in this case) at the lower end of the distribution. In most data 
sets this is not the case and therefore, generally speaking, the lower quartile is not equal to the 
minimum value. 


4.25 This data set would have a large standard deviation because parents differ greatly in the amount 
of money they spend. 


4.27 a xX=(141+---+70)/10=147.5. 
Variance = ((141—147.5)° +---+(70-147.5y )/9 = 2505.83333. 
Standard deviation = /2505.83333 = 50.058. 
b The Memorial Day data are a great deal more consistent than the New Year’s Day data, and 


therefore the standard deviation for Memorial Day would be smaller than the standard 
deviation for New Year’s Day. 


c The standard deviations are given in the table below. 


| Holiday —_| Standard Deviation 
50.058 


The standard deviations for Memorial Day, Labor Day, and Thanksgiving are 18.224, 17.725, 
and 15.312, respectively. The standard deviations for the other three holidays are 50.058, 
47.139, and 52.370. The standard deviations for the same day of the week holidays are all 
smaller than all of the standard deviations for the holidays that can occur on different days. 
There is less variability for the holidays that always occur on the same day of the week. 


4.29 a The average price for the combined areas would have to take into account the fact that more 
houses were sold in Los Osos than in Morrow Bay. 


b The results for Paso Robles are likely to have the higher standard deviation since the range 
for Paso Robles (1,575,000 — 170,000 = 1,405,000) is greater than the range for Grover 
Beach (720,000 — 242,000 = 478,000). 
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c Assuming that the distributions of house prices are roughly symmetrical, we would expect the 
median price for Grover Beach to be around (720,000 + 242,000)/2 = 481,000 and the median 
price for Paso Robles to be around (1,575,000 + 170,000)/2 = 872,500. We expect Paso 
Robles to have the higher median price. 


[ee ae 

Deviation | Variation 
7.81 
49.68 


b_ The values of the coefficient of variation are given in the table in Part (a). The fact that the 
coefficient of variation is smaller for Sample 2 than for Sample | is not surprising since, 
relative to the actual amount placed in the containers, it is easier to be accurate when larger 
amounts are being placed in the containers. 


431 a 


4.33 a Median = average of 25th and 26th values = (57.3 + 58.7)/2 = 58. 
Lower quartile = 13th value = 53.5. 
Upper quartile = 38th value = 64.4. 


b (Lower quartile) — 1.5(iqr) = 53.5 — 1.5(10.9) = 37.15. 
Since 28.2 and 35.7 are both less than 37.15, they are both outliers. 


° ° WQ, 


30 40 50 60 70 80 
Percent of Population Born in State and Still Living There 


The median percent of population born in the state and still living there is 58. There are two 
outliers at the lower end of the distribution. If those two values are disregarded the 
distribution is roughly symmetrical, with values ranging from 40.4 to 75.8. 


Wild = 


20 30 40 50 60 
Maximum Wind Speed (m/s) 


No, the boxplot is not roughly symmetric. It is positively skewed. 


4.37 a Since there are outliers in the data set (152 and 43), it would be more appropriate to use the 
interquartile range than the standard deviation. 


b Lower quartile = 81.5, upper quartile = 94, iqr = 12.5. 
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(Lower quartile) — 3(iqr) = 81.5 — 3(12.5) = 44 

(Lower quartile) — 1.5(iqr) = 81.5 — 1.5(12.5) = 62.75 

(Upper quartile) + 1.5(iqr) = 94 + 1.5(12.5) = 112.75 
~ (Upper quartile) + 3(iqr) = 94 + 3(12.5) = 131.5 


Since the value for students (152) is greater than 131.5, this is an extreme outlier. 
Since the value for farmers (43) is less than 44, this is an extreme outlier. 
There are no non-extreme outliers. 


a a ee ee ee a ee 
50 15) 100 125 150 


Accidents per 1000 


d The insurance company might decide only to offer discounts to occupations that are outliers 
at the lower end of the distribution, in which case only farmers would receive the discount. If 
the company was willing to offer discounts to the quarter of occupations with the lowest 
accident rates then the last 10 occupations on the list should be the ones to receive discounts. 


4.39 a Since the values given are | standard deviation above and below the mean, roughly 68% of 
speeds would have been between those two values. 


b (1 —0.68)/2 = 0.16. Roughly 16% of speeds would exceed 57 mph. 


4.41 a The values given are two standard deviations below and above the mean. Therefore by 
Chebyshev’s Rule at least 75% of observations must lie between those two values. 


b_ By Chebyshev’s Rule at least 89% of observations must lie within 3 standard deviations of 
the mean. So the required interval is 36.92 + 3(11.34) = (2.90, 70.94) . 


ce Ifthe distribution were approximately normal then roughly 2.5% of observations would be 
more than 2 standard deviations below the mean. However, here x —2s5 is negative, and so 
this cannot be the case. Therefore the distribution cannot be approximately normal. 


4.43 For the first test z = (625 — 475)/100 = 1.5 and for the second test z = (45 — 30)/8 = 1.875. Since 
the student’s z score in the second test is higher than in the first, the student did better relative to 
the other test takers in the second test. 


4.45 a _ The values given are | standard deviation below and above the mean, so approximately 68% 
of the sample observations will be between those values. 


b_ The values given are 2 standard deviations below and above the mean, so approximately 5% 
of the sample observations will be outside the interval. 


c Approximately (1 — 0.95)/2 = 0.025 of observations lie below 2000 and approximately (1 — 


0.68)/2 = 0.16 of observations lie below 2500. Therefore approximately 0.16 — 0.025 = 0.135 
(13.5%) of observations lie between 2000 and 2500. 
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d  Chebyshev’s Rule can only tell us that the required proportions are “‘at least” something or “at 
most” something. The Empirical Rule estimates the actual proportions required. 


4.47 We require the proportion of observations between 49.75 and 50.25. At 49.75, 


z = (49.75 — 49.5)/0.1 = 2.5. Chebyshev’s Rule tells us that at most 25. =(.16 of observations 
lie more than 2.5 standard deviations from the mean. Therefore, since we know nothing about the 


distribution of weight readings, the best conclusion we can reach is that at most 16% of weight 
readings will be between 49.75 and 50.25. 


4.49 The value of the standard deviation tells us that a typical deviation of the number of answers 


changed from right to wrong from the mean of this variable is 1.5. However, 0 is only 1.4 below 
the mean and negative values are not possible, and so for a typical deviation to be 1.5 there must 
be some values more than 1.5 above the mean, that is, values above 2.9. This suggests that the 
distribution is positively skewed. 


The value 6 is the lowest whole number value more than 3 standard deviations above the mean. 
Therefore, using Chebyshev’s Rule, we can conclude that at most 1/3? = 1/9 of students, that is, at 
most 162/9 = 18 students, changed at least six answers from correct to incorrect. 


4.51 a 


Per Capita Expenditure 


10 to <12 
12 to <14 
14 to <16 
16 to <18 


Frequency 


0 2} 4 6 8 10 12 14 16 18 
Per Capita Expenditure on Libraries 
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There is no lower whisker because the minimum value and the lower quartile were both 1. 


The minimum, the lower quartile, and the median are all equal because more than half of the 
data values were equal to the minimum value. 


The boxplot shows that 2 is between the median and the upper quartile. Therefore between 
25% and 50% of patients had unacceptable times to defibrillation. 


(Upper quartile) + 3(iqr) = 3 + 3(2) = 9. Since 7 is less than 9, 7 must be a mild outlier. 


x =(4974+---+ 270)/7 = 287146 
The seven deviations are 
209.286, -94.714, 40.286, -132.714, 38.286, -42.714, -17.714. 


The sum of the rounded deviations is 0.002. 


Variance = ((497 —287.71429) +--+ (270 —287.71429)' ) /6 = 12601.905. 


s = V12601.905 = 112.258. 


This is the median, and its value is (4443 + 4129)/2 = $4286. The other measure of center is the mean, 
and its value is $3968.67. This is smaller than the median and therefore less favorable to the supervisors. 


a 


b 


This is a correct interpretation of the median. 


Here the word “range” is being used to describe the interval from the minimum value to the 
maximum value. The statement claims that the median is defined to be the midpoint of this 
interval, which is not true. 


If there is no home below $300,000 then certainly the median will be greater than $300,000 
(unless more than half of the homes cost exactly $300,000). 


The new mean is ¥ =(52+---+ 73)/11= 38.364. 
The new values and their deviations from the mean are shown in the table below. 


lue 


Vv 


W}]Mn{o[O;rmo;BRyBl ee} e|n]s 
CLOINWIBINI DIA] W]rv 


5 
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The deviations are the same as the deviations in the original sample. Therefore the value of s° 
for the new values is the same as for the old values. In general, subtracting (or adding) the same 
number from/to each observation has no effect on s* or ons, since the mean is decreased (or 
increased) by the same amount as the values, and so the deviations from the mean remain the 
same. 


4.63 a Lower quartile = 44, upper quartile = 53, iqr = 9. 


(Lower quartile) — 1.5(iqr) = 44 — 1.5(9) = 30.5 
(Upper quartile) + 1.5(iqr) = 53 + 1.5(9) = 66.5 


Since there are no data values less than 30.5 and no data values greater than 66.5, there are no 
outliers in this data set. 


ns 8 eee 
30 35) 40 45 50 aie 60 


Percentage of Juice Lost 


The median of the distribution is 46. The middle 50% of the data range from 44 to 53 and the 
whole data set ranges from 33 to 60. There are no outliers. The lower half of the middle 50% 
of data values shows less spread than the upper half of the middle 50% of data values. The 
spreads of the lowest 25% of data values is slightly greater than the spread of the highest 25% 
percent of data values. 

4.65 a x =(244+---+200)/14 =192.571. This is a measure of center that incorporates all the 


sample values. 
The data values, listed in order, are: 


160 174 176 180 180 183 187 
11 194 4.200, 205 211 211 244 


Median = average of 7th and 8th values = (187 + 191)/2 = 189. This is a measure of center 
that is the “middle value” in the sample. 


b The mean would decrease and the median would remain the same. 


¢ Trimmed mean =(174+---+211)/12=191. 
Trimming percentage = (1/14)(100) = 7.1%. 


d_ If 244 is changed to 204 then the largest observation is now 211, and one value of 211 will be 
eliminated from the calculation. This makes the largest three data values in the calculation 
204, 205, 211, as compared to 205, 211, 211 in the previous calculation. Therefore the 
trimmed mean will decrease. If 244 is changed to 284, then there is no change in the 
trimmed mean. 
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4.67 
WAWWU=» S 
a Te 
0 100 200 300 400 500 
Aluminum Contamination (ppm) 
The median aluminum contamination is 119. There is one (extreme) outlier, a value of 511. 
Disregarding the outlier the data values range from 30 to 291. The middle 50% of data values 
range from 87 to 182. Even disregarding the outlier the distribution is positively skewed. 
4.69 


Budeet WMT. 
Midrange © e ZZ 
First-class fo) WV} — 


Franchise Cost as Percentage of Total Room Revenue 


The medians for the three different types of hotel are roughly the same, the median for the 
midrange hotels being slightly higher than the other two medians. The midrange hotels have two 
outliers (one extreme) at the lower end of the distribution and the first-class hotels have one 
(extreme) outlier at the lower end. There are no outliers for the budget hotels. If the outliers are 
taken into account then the midrange and first-class groups have a greater range than the budget 
group. If the outliers are disregarded then the budget group has a much greater spread than the 
other two groups. If the outliers are taken into account then all three distributions are negatively 
skewed. If the outliers are disregarded then the distribution for the budget group is negatively 
skewed while the distributions for the other two groups are positively skewed. 


4.71 The fact that the mean is greater than the median suggests that the distribution is positively 
skewed. 


4.73 a The distribution is roughly symmetrical and 0.84 = 1 — 0.16, and so the 84th percentile is the 
same distance above the mean as the 16th percentile is below the mean. The 16th percentile is 
20 units below the mean and so the 84th percentile is 20 units above the mean. Therefore the 


84th percentile is 120. 


b The proportion of scores below 80 is 16% and the proportion above 120 is 16%. Therefore 
the proportion between 80 and 120 is 100 — 2(16) = 68%. So by the Empirical Rule 80 and 
120 are both | standard deviation from the mean, which is 100. This tells us that the standard 


deviation is approximately 20. 


ec z= (90- 100)/20=-0.5. 
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d_ Ascore of 140 is 2 standard deviations above the mean. By the Empirical Rule approximately 
5% of scores are more than 2 standard deviations from the mean. So approximately 5/2 = 
2.5% of scores are greater than 140. Thus 140 is at approximately the 97.5th percentile. 


e Ascore of 40 is 3 standard deviations below the mean, and so the proportion of scores below 


40 would be approximately (100 — 99.7)/2 = 0.15%. Therefore there would be very few 
scores below 40. 
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Summarizing Bivariate Data 


Positive. As temperatures increase, cooling costs are likely to increase. 
Negative. As interest rates rise, fewer people are likely to apply for loans. 


Positive. Husbands and wives tend to come from similar backgrounds, and therefore have 
similar expectations in terms of income. 


Close to zero. There is no reason to believe that there is an association between height and 


IQ. 
Positive. People with large feet tend to be taller than people with small feet. 


Positive. People who are smart and/or well educated tend to do well on both sections, with 
those lacking these attributes doing less well on both sections. 


Negative. Those who spend a lot of time on their homework are likely to spend little time 
watching television, and vice versa. 


Close to zero. The points in the scatterplot will form an inverted “U” shape, making a 
correlation close to zero. 


Scatterplot for which r = 1: 


y 
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Scatterplot for which r = —1: 


Jy d 


2g 


eS a Using acalculator or statistical software package we get r = 0.204. There is a weak positive 
linear relationship between cost per serving and fiber per serving. 


b Using a calculator or statistical software package we get r = 0.241. This correlation 
coefficient is slightly greater than the correlation coefficient for the per serving data. 


53. The fact that the correlation coefficient for college GPA and academic self worth was 0.48 tells 
us that among these athletes there was a weak to moderate positive linear relationship between 
GPA and self worth. Those with higher grades tended to feel better about themselves in an 
academic sense than those with lower grades. The correlation coefficient of 0.46 between college 
GPA and high school GPA gives us the same information about those variables. However, the 
correlation coefficient of —0.36 between college GPA and the procrastination measure tells us that 
there was a weak negative linear relationship between those variables. Those who had a tendency 
to procrastinate generally speaking had lower grades than those without that tendency. 


5.9 a Using a calculator or statistical software package we get r = 0.118. 


b 
Household Debt 


6.0 6.5 7.0 dao 8.0 
Consumer Debt 


Yes. Looking at the scatterplot there does not seem to be a strong relationship between the 
variables. 


© 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. 


Chapter 5: Summarizing Bivariate Data 5 


5.11 


5.13 


79) 1. (88:8)86.1) ii: 
= 88.87 = T~ 2639.824) 0? 
EG A yey po. 


There is a strong positive linear relationship between the concentrations of neurolipofuscin in the 
right and left eye storks. 


The time needed is related to the speed by the equation 


: distance 
time = ———— 
speed 


where the distance is constant. Using this relationship, and plotting the times (over a fixed 
distance) for various feasible speeds, a scatterplot is obtained like the one below. 


Time 


Speed 


These points show a strong negative correlation, and therefore the correlation coefficient is most 
likely to be close to —0.9. 
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Net Directionality 


5.0 les 10.0 (Paes, 15.0 17.5 20.0 
Mean Temperature 


There is one point, (8.06, 0.25), which is separated from the general pattern of the data. If this 
point is disregarded then there is a somewhat strong positive linear relationship between 
mean temperature and net directionality. Even if this point is included, there is still a 
moderate linear relationship between the two variables. 


Using a calculator or statistical software package we find that the equation of the least- 
squares regression line is y =—0.14282+0.016141x, where x = mean water temperature and 


y = net directionality. 
When x = 15, p=—0.14282 + 0.016141(15) = 0.0993. 


The scatterplot and the least-squares line support the fact that, generally speaking, the higher 
the temperature the greater the proportion of larvae that were captured moving upstream. 


Approximately the same number of larvae moving upstream as downstream is represented by 
a net directionality of zero. According to the least-squares line this will happen when the 
mean temperature is approximately 8.8°C. 


The dependent variable is the number of fruit and vegetable servings per day, and the 
predictor variable is the number of hours of television viewed per day. 


Negative. As the number of hours of TV watched per day increases, the number of fruit and 
vegetable servings per day (on average) decreases. 
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Cerebral Grey Matter (ml) 2-5 yr 
850 


800 


750 


700 


-1 0 | 2 3 
Head Circumference z Score at 6-14 months 


Using a calculator or statistical software package we find that r = 0.786. 


Using a calculator or statistical software package we find that the equation of the least- 
squares regression line is y = 714.1470 + 42.5196x , where x = head circumference z score 


and y = volume of grey matter at 2 to 5 years. 


If x = 1.8, then p= 714.1470 + 42.5196(1.8) = 790.682 ml. 


The value x = 3.0 is substantially outside the range of the x-values in the data set, and we do 
not know that the observed linear pattern continues outside this range. Therefore it would not 
be a good idea to use the least-squares line to predict the y-value when x = 3.0. 


Since the slope of the least-squares line is —9.30, we can say that every extra minute waiting for 
paramedics to arrive with a defibrillator lowers the chance of survival by 9.3 percentage points. 
(To say that each minute of waiting “lowers the chances of survival by 10 percent” means that 
one tenth of the probability of surviving is removed for every extra minute of waiting. For 
example, if the chance of survival after 8 minutes of waiting were 25%, it would mean that the 
chance of surviving after 9 minutes of waiting was 25 — 2.5 = 22.5%. This is not the case here.) 


a 


Using a calculator or a statistical software package we find that the correlation coefficient 
between sale price and size is 0.700. There is a moderate linear relationship between sale 
price and size. 


Using a calculator or a statistical software package we find that the correlation coefficient 
between sale price and land-to-building ratio is —0.332. There is a weak negative linear 
relationship between sale price and land-to-building ratio. 


Size is the better predictor of sale price since the absolute value of the correlation between 
sale price and size is closer to | than the absolute value of the correlation between sale price 


and land-to-building ratio. 
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d Using a calculator or statistical software package we find that the least-squares regression line 
for predicting y = sale price from x = size is y =1.3281+0.0053x. 


5.25 The least-squares line is based on the x values contained in the sample. We do not know that the 
same linear relationship will apply for x values outside this range. Therefore the least-squares line 
should not be used for x values outside the range of values in the sample. 


5.27. We know (as stated in the text) that b=r(s,/s,), where s, and s, are the standard deviations of 


the y values and the x values, respectively. Since standard deviations are always positive we know 
that b and r must always have the same sign. 


5.29 a 


Median Distance Walked 


5.0 ES 10.0 12.5 15.0 iI fs: 
Representative Age 


The scatterplot shows a linear pattern between the representative ages of 10 and 17, but there 
is a greater increase in the median distance walked between the representative ages of 7 and 
10 than there is between any other two consecutive age groups. 


b_ Using a calculator or statistical software package we find that the equation of the least- 
squares regression line is y = 492.79773 + 14.76333x , where x is the representative age and y 
is the median distance walked. 


Median Distance i 
Waked (i) 


544.3 551,851 
S06 [Ale | eelo dad 


Representative 


667.3 640.431 | 26.869 
701.1 692.103 | 8.997 
743.774 |_-16.174 
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Residual 


5.0 eS 10.0 1225 15.0 Wes 
Representative Age 


The residual plot reflects the sharp increase in the median distance walked between the 
representative ages of 7 and 10, with a clear negative residual at x =7 and large positive 
residual at x=10. 


D0 ita 


941.47 [1-26.47 
933.0262 _| -42.0262 
931.6189 | 36.3811 


| Region __| Pollution (x) | Medical Cost 
0 

956.4812 | 15.5188 
0 


939.5936 _| 12.4064 
894.56 


b r=—0.581. Since the absolute value of 7 is just a little larger than 0.5 we can describe the 
linear relationship between pollution and medical cost as moderate. 
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Residual 


25.0 Das 30.0 S255 35.0 37.5 40.0 
Pollution 


There is one point whose x value is far greater than those of the other points, suggesting that 
this point might be influential. 


d_ Including the point for the West, the slope of the least-squares line is —4.691 and the intercept 
is 1082.244. If we remove this point, the resulting slope is —7.107 and the intercept is 
1154.371. There is a substantial change in the slope, and therefore the point is influential. 


5.33 a 


Percent Transported 


0) 5000 10000 15000 20000 
Total Number 


Yes, there appears to be a strong linear relationship between the total number of salmon in the 
stream and the percent of salmon killed by bears that are transported away from the stream. 


b The equation of the least-squares regression line is » = 18.483 + 0.00287x , where x is the 


total number of salmon in a creek and y is the percent of salmon killed by bears that were 
transported away from the stream prior to the bear eating. The regression line has been drawn 
on the scatterplot in Part (a). 
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ce The point (3928, 46.8) is unlikely to be influential as its x value does not differ greatly from 
the others in the data set. 


d_- The two points are not influential since the least-squares line provides a good fit for the 


remaining 8 points. Removing the two points will make only a small change in the regression 
line. 


es, =9.16217. This is a typical deviation of a percent transported value from the value 
predicted by the regression line. 


f r° =0.832. This is a large value of r?, and means that 83.2% of the variation in the percent 
transported values can be attributed to the approximate linear relationship between total 
number and percent transported. 


5.35. Using a calculator or statistical software package we find that r* = 0.948 and s, =20.566. The 


value of r° tells us that 94.8% of the variation in six-minute walk time can be attributed to the 
approximate linear relationship represented by the least-squares line. Since 0.948 is close to 1, the 
value shows that the fit of the least-squares line to the points is very good. The value of s,, 


20.566, is a typical deviation of a six-minute walk time from the time predicted by the least- 
squares line. 

5.37 a The value of r* would be 0.154. 
b No, since the r* value for y = first year college GPA and x = SAT II score was 0.16, which is 


not large. Only 16% of the variation in first year college GPA could be attributed to the 
approximate linear relationship between SAT II score and first year college GPA. 


io meet 
Number of Employees 


150 
1s 
100 

V5) 


50 


0 100000 200000 300000 400000 500000 600000 700000 
Total Park Size (Acres) 


b Using a graphing calculator or computer software package we see that the equation of the 
least-squares line is jy = 85.334 —0.0000259x, where x is the total park size in acres and y is 


the number of employees, and also that the value of r’ for these two variables is 0.016. With 
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only 1.6% of the variation of the number of employees being attributable to the least-squares 
line, the line will not give accurate predictions. 


¢ Deleting the point (620231, 67), the equation of the least-squares line is now 
y =83.402 + 0.0000387x . Yes, removal of the point does greatly affect the equation of the 


line, since it changes the slope from negative to positive. 


5.41 The coefficient of determination is r* =1—(SSResid/SSTo) =1—(1235.470/25321.368) = 0.951. 


This tells us that 95.1% of the variation in hardness is attributable to the approximate linear 
relationship between time elapsed and hardness. 


5.43 a The value ofr that makes s, ~s, is 0. The least-squares line is then psd ioe 


b_ For values of r close to 1 or —1, s, will be much smaller than s, . 


¢ s,=Vl-r’'s, =v1—-0.8 (2.5) =1.5. 


d= We now let x = 18-year-old height and y = 6-year-old height. The slope is 
ine r(s, /s,) =0.8(1.7/2.5) = 0.544. So the equation is } =a+bx=a+0.544x . The line 


passes through (x, ¥)=(70, 46), so 46=a+0.544(70), from which we find that 
a= 46 —0.544(70) = 7.92. Hence the equation of the least-squares line is » = 7.92+ 0.544x. 


Also, s, =Vl-r’s, =/1—0.8 (EF VS 1.02% 


5.45 a The equation of the least-squares quadratic curve is } = 0.8660 —0.008452x + 0.000410x° , 
where x = percent sunflower meal and y = feed intake. 


b When x= 20, $=0.8660—0.008452(20) + 0.000410(20)° = 0.861. 


er} 


Sparrow Density 


0.0 0.5 1.0 Ie} 2.0 280 3.0 oe 
Field Strength 
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The relationship between sparrow density and field strength appears to be nonlinear. 


b When y is plotted against Vx the following scatterplot and residual plot are obtained. 


Sparrow Density 


0.5 1.0 15 2.0 
Square root of field strength 


Residual 


0.5 1.0 15 2.0 
Square root of field strenth 


When y is plotted against log(x) the following scatterplot and residual plot are obtained. 
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Sparrow Density 


-1.00 -0.75 -0.50 -0.25 0.00 0.25 0.50 
Log(Field Strength) 


Residual 


-1.00 -0.75 -0.50 -0.25 0.00 0.25 0.50 
Log(Field Strength) 


When x’ =+/x there is slight evidence of a curve in the residual plot, but when x’ = log(x) 
there is no evidence of a curve in the residual plot. Thus x’ = log(x) is the preferable 
transformation. 


¢ The equation of the least-squares line is py =14.80508 — 24.28005 - log(x) . 


d When x=0.5, py =14.80508 — 24.28005 - log(0.5) = 22.114. 
When x = 2.5, y =14.80508 — 24.28005 - log(2.5) = 5.143. 


5.49 Both x and y have been transformed “down,” and a roughly linear pattern has been obtained. Thus 
a scatterplot of the untransformed data would resemble segment 3 in Figure 5.38. 
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5.51 a 


Sunshine Index 


10.975 
10.950 
10.925 
10.900 


10.875 


10.850 


0.0 0.1 0.2 0.3 0.4 0.5 
Cloud Cover Index 


Initially, as the cloud cover index (x) increases from zero, the values of the sunshine index (y) 
rise. Then, between x = 0.2 and x = 0.3, the y values seem to decrease sharply, and then to 
increase again from that point. Certainly neither a linear nor a quadratic model could 
adequately fit that pattern, however a cubic regression could go some way to modeling the 
data. 


b_ The least-squares cubic function is  =10.8768 + 1.4604x — 7.2590x* +9.2342x°, where x is 
the cloud cover index and y is the sunshine index. 


Cloud Cover Sunshine 
Index (x Index 


eS au = 0.88 a OC 
gee mania to 10. 0a 
a See 
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Residual 


0.0 0.1 0.2 0.3 0.4 0.5 
Cloud Cover Index 


There seems to be a random pattern in the residual plot, suggesting that the cubic regression 
was appropriate. 


d When x= 0.25, }=10.8768 + 1.4604(0.25) — 7.2590(0.25)° +9.2342(0.25)° =10.932. 


e Whenx=0.45, ?=10.8768 +1.4604(0.45) — 7.2590(0.45)° +9.2342(0.45) = 10.905. 


f The value 0.75 is well outside the range of the original x values, and we do not know that the 
cubic relationship that we calculated applies outside this range. 


5 S53 A 
Number Waiting for Transplant (Thousands) 


Year 


From 1990 to 1999 the number of people waiting for organ transplants increased, with the 
number increasing by greater amounts each year. 
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b After much trial and error it is found that the transformation y’ = y°" (with x’ =x) produces 


a linear pattern in the scatterplot and a random pattern in the residual plot. The scatterplot is 
shown below. 


(Number Waiting) * (0.15) 


Year 


c The least-squares line relating the transformed variables is ~°' =1.552753 + 0.034856x , 


where x is the year (1990 represented by 1) and y is the number waiting (in thousands). 
When x = 11, 3° =1.552753 + 0.034856(1 1) =1.936164 . From this we get 

p =(1.936164)'°'> = 81.837 . The least-squares line predicts that in 2000 the number of 
patients waiting will be around 81,800. 


d= We have to be confident that the pattern observed between 1990 and 1999 will continue up to 
2000. This is reasonable so long as circumstances remain basically the same. To expect the 
same pattern to continue to 2010, however, would be unreasonable. 


awoke) 
Canal Length (mm) 


Age (years) 
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The relationship between age and canal length is not linear. A transformation that makes the plot 
roughly linear is x’ = 1/ Jx (with y’=y). The resulting scatterplot and residual plot are shown 
below. 


Canal Length (mm) 
700 


600 


500 


0.50 0.75 1.00 1.25 is) IS 2.00 


Residual 


0.50 0.75 1.00 125 1.50 gis 2.00 
Age”(-0.5) 


5.57 Calculating the least-squares line for y’ =In(p/(1— p)) against x = high school GPA we get 
y =-2,89399 + 1.70586x . Thus the logistic regression equation is 


—2.89399+1.70586x 
(4 


a 2, 8939941,70586x * 
l+e ‘i 


For x =2.2 the equation predicts 
ea 8932941 70586(2.2) 


P=—Saeeciscas = 702. 


—2.89399+1.70586(2.2 
l+e am 
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309) a 
Lowand Proportion 
0 i 2 3 4 5} 6 7 8 
Exposure (days) 
Mid-Elevation Proportion 
Exposure (days) 
Yes, the plots have roughly the shape you would expect from “logistic” plots. 
b 


‘= In(p/(1=p)) 


a ns Sid ca reivee ip 0.09) one font welt] p13 63H wl 
ee 0.06 25154 
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The least-squares line relating y’ and x (where x is the exposure time in days) is 
y’ =1.51297 —0.58721x . The negative slope reflects the fact that as exposure time increases 
the hatch rate decreases. 


ce The logistic regression equation is 
g}-51297-0.58721x 
jee 297-0.58721x 
| + eb 51297058721 


For x =3 the equation predicts 
o}-51297-0.58721(3) 


De | 4 et 51297-0.587213) = 0.438. 
For x =5 the equation predicts 
o}-51297-0.58721(5) 
P= es = 90-194. 
1 + et 51297-0.587215) 


d When p=0.5, y’=In(p/(1- p)) =1n(0.5/(1—0.5)) =0. So, solving 1.51297 — 0.5872 1x =0 
we get x =1.51297/0.58721 = 2.577 days. 


Killed In(p/(1—p 
1.58412 
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Proportion Killed 
1.0 
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0.6 
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0.4 
0.3 


0.2 


0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 
Concentration (g/cc) 


b The least-squares line relating y’ and x (where x is the concentration in g/cc) is 


Y =-1.55892 + 5.76671x . The positive slope reflects the fact that as the concentration 
increases the proportion of mosquitoes that die increases. 


e¢ When p=0.5, y’=In(p/(1- p)) =1n(0.5/(1-0.5)) =0. So, solving —1.55892 + 5.76671x =0 
we get x =1.55892/5.76671 = 0.270. LD50 is estimated to be around 0.270 g/cc. 


5.63 a Any image plotted between the dashed lines would be associated with Cal Poly by roughly 
the same percentages of enrolling and non-enrolling students. 


b_ The images that were more commonly associated with non-enrolling students than with 
enrolling students were “Average,” “Isolated,” and “Back-up school,” with “Back-up school” 
being the most common of these amongst non-enrolling students. The images that were more 
commonly associated with enrolling students than with non-enrolling students were (in 
increasing order of commonality amongst enrolling students) “Excitingly different,” 
“Personal,” “Selective,” “Prestigious,” “Exciting,” “Intellectual,” “Challenging,” 
“Comfortable,” “Fun,” “Career-oriented,” “Highly respected,” and “Friendly,” with this last 
image being marked by over 60% of students who enrolled and over 45% of students who 
didn’t enroll. The most commonly marked image amongst students who didn’t enroll was 
“Career-oriented.” 


5.65 a r=vV0.89 =0.943. (Note that r is 0.943 rather than —0.943 since the slope of the least- 


squares line is positive.) There is a very strong positive linear relationship between assault 
rate and lead exposure 23 years prior. No, we cannot conclude that lead exposure causes 
increased assault rates, since the value of r close to | tells us that there is a strong linear 
association between lead exposure and assault rate, but tells us nothing about causation. 


b The equation of the least-squares regression line is py =—24.08 + 327.41x , where y is the 
assault rate and x is the lead exposure 23 years prior. When x = 0.5, y =—24.08 + 327.41(0.5) 
= 139.625 assaults per 100,000 people. 
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c 89% of the year-to-year variability in assault rates can be explained by the relationship 
between assault rate and gasoline lead exposure 23 years earlier. 


d= The two time series plots, generally speaking, move together. That is, generally when one 
goes up the other goes up and when one goes down the other goes down. Thus high assault 
rates are associated with high lead exposures 23 years earlier and low assault rates are 
associated with low lead exposures 23 years earlier. 


5.67 a r= —0.981. This suggests a very strong linear relationship between the amount of catalyst and 
the resulting reaction time. 


b 
Reaction Time 
I 2 3 4 5 
Amount of Catalyst 
The word /inear does not provide the most effective description of the relationship. There are 
curves that would provide a much better fit. 
5.69 a 


Exam Score 


0 5 10 15 20 25 
Test Anxiety 
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There is one point, (0, 77), that is far separated from the other points in the plot. There is a 
clear negative relationship between scores on the measure of test anxiety and exam scores. 


b There appears to be a very strong negative linear relationship between test anxiety and exam 
score. (However, without the point (0, 77) the relationship would be significantly less strong.) 


¢ r=—0.912. This is consistent with the observations given in Part (b). 
d No, we cannot conclude that test anxiety caused poor exam performance. Correlation 
measures the strength of the linear relationship between the two variables, but tells us nothing 


about causation. 


arn aes 


8th Grade (2000) 


10 12 14 16 18 20 DP) 24 
Ath Grade (1996) 


There is a clear positive relationship between the percentages of students who were proficient 
at the two times. There is the suggestion of a curve in the plot. 


b The equation of the least-squares line is y =—3.13603 + 1.52206x , where x is the percentage 
proficient in 4th grade (1996) and y is the percentage proficient in 8th grade (2000). 


e Whenx= 14, p=—3.13603 + 1.52206(14) =18.173 . This is slightly lower than the actual 
value of 20 for Nevada. 


5.73 a Whenx=25, ~=62.9476 —0.54975(25) = 49.204 . So the residual is 
y—y=70— 49.204 = 20.796 . 


b r=—V0.57 =—0.755 (The correlation coefficient is negative since the slope of the least- 
squares regression line is negative.) 


ce Weknow that r* =1—SSResid/SSTo . Solving for SSResid we get 
SSResid = SSTo(1 — 77 ) = 2520(1 — 0.57) = 1083.6. Therefore 


s, = {SSResid/(n — 2) = /1083.6/8 = 11.638 . 
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5.75) = alln=-0.717 


b += -0.835. The absolute value of this correlation is greater than the absolute value of the 
correlation calculated in Part (a). This suggests that the transformation was successful in 
straightening the plot. 


math 9 


0 20 40 60 80 100 


-0.5 0.0 0.5 1.0 ites) 2.0 
log(x) 
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-0.5 0.0 0.5 1.0 igs) 2.0 
log(x) 


b_ Plotting log(y) against log(x) does the best job of producing an approximately linear 
relationship. The least-squares line of log(yv) on log(x) is log() =1.61867 — 0.31646 log(x). 
So when x = 25, log() =1.61867 — 0.31646 log(25) = 1.17629. Therefore 
py =10''" =15.007. The predicted lead content is 15.007 parts per million. 


S59) ae r= 0 


b For example, adding the point (6, 1) gives r=0.510. (Any y-coordinate greater than 0.973 
will work.) 


c For example, adding the point (6, -1) gives r =—0.510. (Any y-coordinate less than —0.973 
will work.) 
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CRS.1 


CRS5.3 


CRS5.5 


Cumulative Review Exercises 


Here is one possible design. Gather a number of volunteers (around 50, for example) who are 
willing to take part in an experiment involving exercise. Establish some measure of fitness, 
involving such criteria as strength, endurance, and muscle mass. Measure the fitness of each 
person. Randomly assign the 50 people to two groups, Group A and Group B. (This can be done 
by writing the names of the 50 people on identical slips of paper, placing the slips of paper in a 
hat, mixing them, and picking 25 names at random. Those 25 people will be put into Group A and 
the remainder will be put into Group B.) People in Group A should be instructed on a program of 
exercise that does not involve the sort of activity one would engage in at the gym, and this 
exercise should be undergone wearing the new sneakers. People in Group B should be instructed 
on an equivalent program of exercise that primarily involves gym-based activities, and this 
exercise should be undergone without the wearing of the new sneakers. At the end of the program 
the fitness of all the participants should be measured and a comparison should be made regarding 
the increase in fitness of the people in the two groups. 


This is an experiment since the participants are assigned to the groups by the experimenters. 


The peaks in rainfall do seem to be followed by peaks in the number of E. coli cases, with rainfall 
peaks around May 12, May 17, and May 23 being followed by peaks in the number of cases on 
May 17, May 23, and May 28th. (The incubation period seems to be more like 5 days than the 3 
to 4 days mentioned in the caption.) Thus the graph does show a close connection between 
unusually heavy rainfall and the incidence of the infection. The storms may not be responsible for 
the increased illness levels, however, since the graph can only show us association, not causation. 


Foal weight (kg) 


500 520 540 560 580 600 620 640 660 
Mare Weight (kg) 


The apparently random pattern in the scatterplot shows that there is very little relationship 
between the weight of the mare and the weight of her foal. This is supported by the value of the 
correlation coefficient. A value so close to zero shows that there is little to no linear relationship 
between the weight of the mare and the weight of the foal. 
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CRS.7 a 
ee 
ee ee 
@e@ eee @ 
=o-_2 29090000097 eF st 
2.4 3.6 4.8 6.0 72 8.4 9.6 


Copper Content (%) 


b x =(2.0+---+10.1)/26 =3.654 . The mean copper content is 3.654%. 


Median = average of 13th and 14th values = (3.3 + 3.4)/2 = 3.35. The median copper content 
1s: 3.5000. 


c With a sample size of 26, the 8% trimmed mean removes 2 values from each end, since 8% 
of 26 is approximately 2. Removing 10.1 and 5.3 from the upper end will result in a 
noticeable reduction in the mean since 10.1 is an extreme value, while removing 2.0 and 2.4 
from the lower end will have less effect on the mean. Therefore the trimmed mean will be 
smaller than the mean. 


CRS5.9 a_ The dotplot and stem-and-leaf display are shown below. 


e 

® @ 

® @ ® 

@ e@ @ zs 

° see eo 8 © @8@ ® ® 
26 5.0 7.5 = Sel SE rae gee nl 


Lowest Monthly Premium ($) 


1 | 8888888 

2 

3 

4 | 14 

a 

6 | 133444499 

(APS SBS) 

8 | 68 

97) 4 

10 | 01123336 

11 | 46 

121533 

E2237 

14 | 004 

1s) 

[Gus 

Le 19 

18 

19 | 66 Stem: Ones 
201-0 Leaf: Tenths 


b Looking at the displays, one would expect the mean and the median to be roughly the same. 
(Looking at the data points between 4.1 and 20.0, you might notice some positive skewness, 
and therefore conclude that the mean would be bigger than the median. However, the seven 
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values of 1.87 separated from the rest of the data at the lower end of the distribution will 
roughly compensate for that positive skew making the mean and the median roughly equal.) 


c Mean = $9.459, median = $9.48. 


d_ A dotplot for the highest premium data is shown below. 


@ @ 

ee @ ® 

ee e ® 

ee e@e@ @ e 

ee ee @ ® 

ee e000 4 

@ ee e000 e ; 

66 72 78 84 90 96 102 


Highest Monthly Premium ($) 


e Mean = $72.846, median = $68.61. 


CRS.11 
a ¥=(3099+---+3700)/10 = 2965.2 . 


Variance = ((3099 — 2965.2) +-+++ (3700 — 2965.2) )/9 = 294416.622. 
§ =V¥294416.622 =542.602. 
The data values listed in order are: 
2297 2401 ~2510 2682° 2824 "3068 3099 3112 3700 3959 
Lower quartile = 3rd value = 2510. 
Upper quartile = 8th value = 3112. 
Interquartile range = 3112 — 2510 = 602. 
b_ The interquartile range for the chocolate pudding data (602) is less than the interquartile 
range for the tomato catsup data (1300). So there is less variability in sodium content for the 


chocolate pudding data than for the tomato catsup data. 


CRS5.13 
a xX=(4.8+---+3.7)/20=4.93. 


The data, listed in order are: 


0.4 0.9 1.4 1.4 oe) 2.4 2.9 353 3.4 ar CEy 
4.8 > 5 5.4 6.1 tre] 10.8 13.8 14.8 


Median = average of 10th and | 1th = (3.5 + 3.7)/2 = 3.6. 


The mean is greater than the median. This is explained by the fact that the distribution of 
blood lead levels is positively skewed. 
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15 


CRS.15 


CRS5.17 


African Americans - (V77}}-—_ e @ 


0 2 4 6 8 10 12 14 16 
Blood Lead Level (micrograms per decilliter) 


The median blood lead level for the African Americans (3.6) is slightly higher than for the 
Whites (3.1). Both distributions seem to be positively skewed. There are two outliers in the 
data set for the African Americans. The distribution for the African Americans shows a 
greater range than the distribution for the Whites, even if you discount the two outliers. 


Yes, it appears that the variables are highly correlated. 


There is a strong positive linear relationship between the observations by the standard 
spectrophotometric method and the new, simpler method. 


Perfect correlation would result in the points lying exactly on some straight line, but not 
necessarily on the line described. 


This value of 7’ tells us that 76.64% of the variability in clutch size can be attributed to the 
approximate linear relationship between snout-vent length and clutch size. 


Using r? =1—SSResid/SSTo we see that SSResid = SSTo(I-r°). So here 
SSResid = 4395 1(1 — 0.7664) = 10266.9536. Therefore 


S, =, pe = —_e = 29.250. This is a typical deviation of an observed clutch 
nh — 


size from the clutch size predicted by the least-squares line. 
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CR5.19 
a 
Agricultural Intensity 
0 50 100 150 200 250 
Population Density 

Yes, the scatterplot shows a strong positive association between population density and 

agricultural intensity. 
b 


Agricultural Intensity 
160 


140 


120 


0 10000 20000 30000 40000 50000 60000 
(Population Density )*2 


The plot now seems to be straight, particularly if you disregard the point with the greatest x 
value. 
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c 
Log(Agricultural Intensity) 
0 50 100 150 200 250 
Population Density 

This transformation seems to have been successful in straightening the plot. Also, unlike the 

plot in Part (b), the variability of the quantity measured on the vertical axis does not seem to 

increase as x increases. 
d 


Log(Agricultural Intensity) 


0 10000 20000 30000 40000 50000 60000 
(Population Density )/2 


No, this transformation has not been successful in producing a linear relationship. There is a 
clear curve in the plot. 
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6.1 


6.3 


6.5 


6.7 


6.9 


6.11 


6.13 


Chapter 6 
Probability 


One percent of all people who suffer a cardiac arrest in New York City survive. 


b Roughly 23 of the 2329 cardiac arrest sufferers survived. 
P(L)- P(F) =(0.58)(0.5) = 0.29 # P(LO F). Therefore the events L and F are not independent. 


No. The events are not independent because the probability of experiencing pain daily given that 
the person is male is not equal to the probability of experiencing pain daily given that the person 
is not male. 


They are dependent events, since the probability that the selected student has TB given that the 
student is a recent immigrant is not equal to the (unconditional) probability that the student has 


(0.1) (0.1) (0.1) = 0.001. We have to assume that she deals with the three errands 
independently. 


P(remembers at least one) = 1— P(forgets them all) = 1—0.001= 0.999. 
P(remembers Ist, forgets 2nd, forgets 3rd) = (0.9)(0.1)(0.1) = 0.009. 


The expert was assuming that there was a | in 12 chance of a valve being in any one of the 12 
clock positions and that the positions of the two air valves were independent. 


Since the car’s wheels are probably the same size, if one of the wheels happens to have its air 
valve in the same position as before then the other wheel is likely also to have its air valve in 
the same position as before. Thus the positions of the two air valves are not independent, and 
1/144 is smaller than the correct probability. 


P(1-2 subsystem works) = (0.9)(0.9) = 0.81. 


P(1-2 subsystem doesn't work) =1—0.81=0.19. 
P(3-4 subsystem doesn't work) = 0.19. 


P(system won't work) = (0.19)(0.19) = 0.0361. 
P(system will work) =1—0.0361 = 0.9639. 


P(system won't work) = (0.19)(0.19)(0.19) = 0.006859. 
So P(system will work) =1— 0.006859 = 0.993141. 


The probability that one particular subsystem will work is now (0.9) (0.9) (0.9) = 0.729. So 
the probability that the subsystem won’t work is 1—0.729 = 0.271. Therefore the probability 
that neither of the two subsystems works (and so the system doesn’t work) is (0.271) (0.271) 
= (0.073441. So the probability that the system works is 1 — 0.073441 = 0.926559. 


79 
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a 


6.15 


6.23 


6.25 


6.27 


6.29 


P(female| predicted female) = 432/562 = 0.769. 


P(male| predicted male) = 390/438 = 0.890. 
Since these conditional probabilities are not equal, we see that a prediction that a baby is male 
and a prediction that a baby is female are not equally reliable. 


ce Since the conditional probabilities in (a) and (b) are not equal, we see that the method is not 
equally reliable for predicting gender for boys and for girls. 


P(very harmful | current smoker) = 60/96 = 0.625. 
P(very harmful | former smoker) = 78/99 = 0.788. 


P(very harmful | never smoked) = 86/99 = 0.869. 
Since the first probability calculated is less than either of the other two, the conclusion is justified. 


a 425/500 = 0.85 

b 1-—405/500=0.19 

¢ (415/500)(415/500) = 0.6889 
d 5220/6000 = 0.87 


a The total number of students listed is 18000. The total number of males listed is 11200. So 
P(male) = 11200/18000 = 0.622. 


b 3000/18000 = 0.167 
c¢ 2100/18000 = 0.117 


d= The number of males not from Agriculture is 11200—2100=9100. So the required 
probability is 9100/18000 = 0.506. 


Answers will vary. 


Results of the simulation will vary. The correct probability that the project is completed on time 
is 0.8504. 


a Results of the simulation will vary. The correct probability that the project is completed on 
time is 0.6504. 


b Jacob’s change makes the bigger change in the probability that the project will be completed 
on time. 


They are dependent events, since someone who is attempting to quit is slightly more likely to 
return to smoking within two weeks if he/she does not use a nicotine aid than if he/she does use a 
nicotine aid. 
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631 a 0.2+0.25=0.45 
be i-03 -0:7 
c | 0.2 + 0.25 + 0.3 = 0.75 
6.33 a 500000/42005100 = 0.0119 
b = 100/42005100 = 0.00000238 
¢ 505100/42005100 = 0.0120 
6.35 In Parts (a)-(c), examples of the possible simulation plans are given. 


a Use a single-digit random number to represent the outcome of the game. The digits 0—7 will 
represent a win for seed 1, and digits 8—9 will represent a win for seed 4. 


b_ Use a single-digit random number to represent the outcome of the game. The digits 0—5 will 
represent a win for seed 2, and digits 6—9 will represent a win for seed 3. 


c Use a single-digit random number to represent the outcome of the game. 
If seed 1 won game | and seed 2 won game 2, the digits 0—5 will represent a win for seed 1, 
and digits 6—9 will represent a win for seed 2. 
If seed | won game | and seed 3 won game 2, the digits 0-6 will represent a win for seed 1, 
and digits 7—9 will represent a win for seed 3. 
If seed 4 won game | and seed 2 won game 2, the digits 0-6 will represent a win for seed 2, 
and digits 7—9 will represent a win for seed 4. 
If seed 4 won game | and seed 3 won game 2, the digits 0—5 will represent a win for seed 3, 
and digits 6—9 will represent a win for seed 4. 


d Answers will vary. 
e Answers will vary. 
f Answers will vary. 
g The estimated probabilities from Parts (e) and (f) will differ because they are based on 


different sets of simulations. The estimate from Part (f) is likely to be the better one, since it 
is based on more runs of the simulation than the estimate from Part (e). 
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7.1 


Ths) 


Chapter 7 
Probability Distributions 


a Discrete 
b Continuous 
c Discrete 
d_ Discrete 


e Continuous 


Relative Frequency 


0.6 
0.5 
0.4 
0.3 
0.2 


0.1 


oe Insured | Not insured 


b 1-0.6=0.4 
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hes) a 


Relative Frequency 
0.5 


0.4 
0.3 
0.2 
0.1 


0.0 
0 10 25 50 


Size of Donation 


b $0 
e¢ 0.2+0,05=0.25 
de.03:7-022 4.0.03 =0.55 


7.7 a The probability that everyone who shows up can be accommodated is 
P(x $100) =0.05+0.1+0.12+0.14+0.24+ 0.17 =0.82. 


b 1-—0.82=0.18. 
c For the person who is number | on the standby list to get a place on the flight, 99 or fewer 


people must turn up for the flight. The probability that this happens is 
P(x $99) =0.05+0.14+0.12+0.14+0.24 =0.65. 


For the person who is number 3 on the standby list to get a place on the flight, 97 or fewer 
people must turn up for the flight. The probability that this happens is 
P(x $99) =0.05 + 0.1+0.12=0.27. 


79 a Supplier | 
b Supplier 2 
¢ Supplier 1 is to be recommended, since the bulbs from there have the greater lifetimes, on 
average. (Also, bulbs from Supplier | are more consistent in terms of lifetime than bulbs from 
Supplier 2.) 
d= About 1000 hours 


e About 100 hours 


© 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. 


Chapter 7: Probability Distributions 85 
ee ee 


711 a P(x $5)=(1/10)(5-0)=0:5. 
b PBS x<5)=(1/10)(5-3)=0.2. 
c 5 minutes 


7.13. The density values at the relevant x values are given in the table below. 


epee OMEN 255105 
0.5 L.5 


a P(x $0.5) =(0.5-0)((0.5 +1)/2) = 0.375. 


b P(0.25<x<0.5)=(0.5 —0.25)((0.75 + 1)/2) = 0.21875. 


© P(x 20.75) =(1—0.75)((1.25 +1.5)/2) = 0.34375. 
7.15 a 0.9599 
b 0.2483 
¢ 1-0.8849=0.1151. 
d  1—0.0024 = 0.9976. 
e 0.7019—0.0132 = 0.6887 
f 0.8413—0.1587 = 0.6826. 
g 1.0000 
7.17 a 0.9909 
b 0.9909 
c 0.1093 
d 0.9996 — 0.8729 = 0.1267 
e 0.2912 —0.2206 = 0.0706 
f 1-0.9772=0.0228 
g 1-0.0004 =0.9996 


h_ 1.0000 
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7.19 a -1.96 

b 2.33 

c -—1.645 

d P(z<z*)=1-0.02=0.98. So z*=2.05. 

e P(z<z*)=1-0.01=0.99. So z*=2.33. 

f P(z>z*)=0.2/2=0.1. So P(z<z*)=1—-0.1=0.9. Therefore z*=1.28. 
7.21 a P(z>z*)=0.05/2=0.025. So P(z<z*)=1—-0.025=0.975. So 2*=1.96. 

b P(z>2z*)=0.1/2 =0.05. So P(z< z*) =1—0.05 = 0.95. So z*=1.645. 

ce P(z>z*)=0.02/2=0.01. So P(z<z*)=1—0.01=0.99. So z*=2.33. 

d P(z>z*)=0.08/2 = 0.04. So P(z < z*) =1-0.04 = 0.96. So z*=1.75. 
723. a P(x<5)=P(z<(5—5)/0.2)=P(z<0)=05. 

b  P(x<5.4)=P(2<(5.4—5)/0.2) = Ple<2)=0.9772. 

COPE XN S54) = P(X 0:4) = 0.9772; 


d  P(4.6<x<5.2)=P((4.6—5)/0.2<z<(5.2—5)/0.2) = P(-2 <z<1)=0.8413-0.0228 
= 0.8185. 


e P(x>4.5)=P(z>(4.5—5)/0.2) = P(z >-2.5) =1- P(z<-2.5) =1—0.0062 = 0.9938. 
f P(x>4.0)=P(z>(4.0-5)/0.2) = P(z >-5) =1- P(z<—5) =1-0.0000 = 1.0000. 
7.25 If P(z>z*)=0.1 then P(z< z*)=0.9; soz*=1.28. Thus x= “#+(z*)o =1.6+(1.28)(0.4) 
= 2.113. The worst 10% of vehicles are those with emission levels greater than 2.113 parts per 


billion. 


7.27 a Let the left atrial diameter be x. P(x < 24) = P(z<(24—26.4)/4.2) = P(z < -0.57) = 0.2843. 
Db aP(xe32)= P(z >(32- 26.4)/4.2) = P(z >1.33)=0.0918. 


e P(25<x<30)=P((25-26.4)/4.2<z<(30- 26.4)/4.2) = P(-0.33 < z< 0.86) 
= 0.8051—0.3707 = 0.4344. 


d If P(z>z*)=0.2, then P(z< z*)=0.8; soz*=0.84. Thus x = “4+(z*)o = 26.4 +(0.84)(4.2) 
= 29.928 mm. 
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7.29 Let the carbon monoxide exposure be x. 
P(x >20)=P(z>(20-1 8.6)/5.7) = P(z > 0.25) =1— P(z < 0.25) = 1—0.5987 = 0.4013. 
P(x > 25) = P(z>(25-18.6)/5.7) = P(z> 1.12)=1-—P(z<1.12)=1-—0.8686 = 0.1314. 


7.31 Let the diameter of the cork produced be x. 
P(2.9<x<3.1)=P((2.9-3.05)/0.01<z<(.1- 3.05)/0.01) = P(-15< z<5)=1.0000. 


A cork made by the machine in this exercise is almost certain to meet the specifications. This 
machine is therefore preferable to the one in the Exercise 7.30. 


7.33 The fastest 10% of applicants are those with the lowest 10% of times. If P(z< z*)=0.1, then 
z* =—],28. The corresponding time is 2+ (z*)o = 120+ (-—1.28)(20) = 94.4. Those with times 
less than 94.4 seconds qualify for advanced training. 


Fussing Time 


Normal Score 


The clear curve in the normal probability plot tells us that the distribution of fussing times is 
not normal. 
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b The square roots of the data values are shown in the table below. 


Fussing Time | Normal Score | sqrt(Fussing Time) | 


1.46629 


ew 00min 0.714 2.44949 
0.946 2.82843 


11.00 1.245 3.31662 
14.00 Lio? 3.74166 


sqrt(Fussing Time) 


Normal Score 


The transformation results in a pattern that is much closer to being linear than the pattern in 
Part (a). 
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dot 


Frequency 


x 


No. The distribution of x is positively skewed. 


Frequency 


1.68 ee 1.76 1.80 1.84 
log(x) 


Yes. The histogram shows a distribution that is slightly closer to being symmetric than the 
distribution of the untransformed data. 
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Frequency 


7 


6.9 Te 7.5 7.8 8.1 
sqrt(x) 


Both transformations produce histograms that are closer to being symmetric than the 
histogram of the untransformed data, but neither transformation produces a distribution that is 
truly close to being normal. 


7.39 Yes. The curve in the normal probability plot suggests that the distribution is not normal. 


7.41 


Diameter 


9 =| 0) l 
Normal score 


ho 


Since the pattern in the normal probability plot is very close to being linear, it is plausible that 
disk diameter is normally distributed. 
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7.43 3a 


Frequency 


9 115 
The histogram is positively skewed. 
Relative 
Frequency wi Rel. Freq: 


Number of Purchases 
Interval Frequency Interval 
Width 
= Freq./2071 Int. Wdth 
904 


th: 
[SUS 20 ES ane 


Density 


[-100t0< 110] 9 ~~ | 0.004 | 0.488 | 0.009 
2100 cor 1300] Wot 100.0039 05 104475 eae ae 
[-130t0< 140] 6 | —0.003_—|_—(0.430 
150) 3 
Sc ee 


°140 to < 150 3 0.001 0.415 0.003 
°150 to < 160 0.000 0.402 0.000 
°160 to < 170 0.001 0.389 0.002 
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Density 
0.35 


0.00 
3 5 7 9 11 13 


sqrt(Number of Purchases) 


No. The transformation has resulted in a histogram which is still clearly positively skewed. 


7.45 Yes. In each case the transformation has resulted in a histogram that is much closer to being 
symmetric than the original histogram. 


7.47 a No, since 5 ft 7 in. is 67 inches, and if x = height of a randomly chosen women, then 
P(x <67)= P(z<(67-66)/2) = P(z< 0.5) =0.6915, which is not more than 94%, 


b About 69% of women would be excluded by the height restriction. 


7.49 Let the pH of the randomly selected soil sample be x. 


a P(5.9<x<6.15)=P((5.9-6)/0.1<z<(6.15—6)/0.1) = P(-1<z<1.5) 
= 0.9332 — 0.1587 = 0.7745. 


b P(x>6.10)=P(z>(6.10—6)/0.1) = P(z > 1) =1- P(z <1) =1-0.8413 = 0.1587. 


fe) 


P(x $5.95) = P(z<(5.95-6)/0.1) = P(z < 0.5) = 0.3085. 


d_ If P(z>z*)=0.05, then z*=1.645. So the corresponding x value is “+(z*)o 
=6+(1.645)(0.1) = 6.1645. The largest 5% of pH readings are those greater than 6.1645. 
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SS ee 


7510 a P(250< x < 300) = P((250-266)/16< z<(300- 266)/16) = P(-1<z<2.125) 
= 0.9832 — 0.1587 = 0.8245. 


b P(x $240) = P(z<(240-266)/16) = P(z < 1.625) = 0.0521. 
¢ Sixteen days is | standard deviation, so we need P(—1< z< 1)=0.8413-—0.1587 = 0.6826. 


d P(x2310)=P(z2(310- 266)/16) = P(z 2 2.75) =1— P(z $ 2.75) = 1—0.9970 = 0.0030. 


This should make us skeptical of the claim, since it is very unlikely that a pregnancy will last 
at least 310 days. 


e The insurance company will refuse to pay if the birth occurs within 275 days of the beginning 
of the coverage. If the conception took place after coverage began, then the insurance 
company will refuse to pay if the pregnancy is less than or equal to 275-14 = 261 days. 


P(x $261) = P(z <(261—266)/16) = P(z $-0.31) = 0.3783. 
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CR7.1 


CR7.3 


CR7.5 


CR7.7 


Cumulative Review Exercises 


Obtain a group of volunteers (we’ll assume that 60 people are available for this). Randomly 
assign the 60 people to two groups, A and B. (This can be done by writing the people’s names on 
slips of paper, placing the slips in a hat, and drawing 30 slips at random. The people whose names 
are on those slips should be placed in Group A. The remaining people should be placed in Group 
B.) Meet with each person individually. For people in Group A offer an option of being given $5 
or for a coin to be flipped. Tell the person that if the coin lands heads, he/she will be given $10, 
but if the coin lands tails, he/she will not be given any money. Note the person’s choice, and then 
proceed according to the option the person has chosen. For people in Group B, give the person 
two $5 bills, and then offer a choice of returning one of the $5 bills, or flipping a coin. Tell the 
person that if the coin lands heads, he/she will keep both $5 bills, but if the coin lands tails, he/she 
must return both of the $5 bills. Note the person’s choice, and then proceed according to the 
option the person has chosen. Once you have met with all the participants, compare the two 
groups in terms of the proportions choosing the gambling options. 


No. The percentages given in the graph are said to be, for each year, the “percent increase in the 
number of communities installing” red-light cameras. This presumably means the percent 
increase in the number of communities with red-light cameras installed, in which case the 
positive results for all of the years 2003 to 2009 show that a great many more communities had 
red-light cameras installed in 2009 than in 2002. 


First, the median is approximately equal to the mean, implying a roughly symmetrical 
distribution. Second, consider the comparison of z values given below. 


Statistic z value in this distribution | z value in normal distribution 


=1.645 
-0.67 
0.67 
1.645 


Although the z values do not agree exactly, they are somewhat close, and therefore it would seem 
reasonable to suggest that the distribution could have been approximately normal. 


A ball bearing is acceptable if its diameter is between 0.496 and 0.504 inches. Under the new 
setting, P(0.496 < diameter < 0.504) = P((0.496 —0.499)/0.002 < z < (0.504 — 0.499)/0.002) 

= P(-1.5$2 82.5) = 0.9938 — 0.0668 = 0.9270. So the probability that a ball bearing is 
unacceptable is 1—0.9270 = 0.0730. Therefore, 7.3% of the ball bearings will be unacceptable. 
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CR7.9 


400 


300 


200 


100 


-2 -] 0 I 2 
Normal Score 


The pattern in the normal probability plot is reasonably close to being linear, and so, yes, 
normality is plausible. 
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Chapter 8 
Sampling Variability and Sampling Distributions 


Note: In this chapter, numerical answers to questions involving the normal distribution were found using 


statistical tables. Students using calculators or computers will find that their answers differ slightly from 
those given. 


8.1 A population characteristic is a quantity that summarizes the whole population. A statistic is a 
quantity calculated from the values in a sample. 


8.3 a Population characteristic 
b = Statistic 
ec Population characteristic 
d Population characteristic 
e Statistic 

8.5 Answers will vary. 


8.7 a 


Sample | Sample mean 


2) 
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Density 


Y 
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iN) 2.0 pre 3.0 355) 
Sample Mean 
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Sample | Sample Mean 
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The sampling distribution of the sample mean, X , is shown below. 
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Density 


1.0 iS 2.0 25 3.0 oho) 4.0 
Sample Mean 


¢ Both distributions are symmetrical, and their means are equal (2.5). However, the “with 
replacement” version has a greater spread than the first distribution, with values ranging from 
1 to 4 in the “with replacement” distribution, and from 1.5 to 3.5 in the “without 
replacement” distribution. The stepped pattern of the “with replacement” distribution more 
closely resembles a normal distribution than does the shape of the “without replacement” 
distribution. 


8.9 


Sample Mean | Sample Median | (Max + Min)/2 
: ean eee eee 


(Sample Median) | 0.7 | 0.3 


(Max + Miny/2 | 2.5 Bo 
pa(Max + Min)/2) 
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Using the sampling distributions above, the means of the three statistics are calculated to be 
E(¥)=3.2, E(Sample median) = 3.3, and E((Max + Min)/2)=3.15. Since “=3.2 and 
E(X) = 3.2, we know that, on average, the sample mean will give the correct value for 2, 


which is not the case for either of the two other statistics. Thus, the sample mean would be 
the best of the three statistics for estimating 42. (Also, since the distribution of the sample 


mean has less variability than either of the other two distributions, the sample mean will 
generally produce values that are closer to w than the values produced by either of the other 


statistics.) 


8.11 The sampling distribution of ¥ will be approximately normal for the sample sizes in Parts (cf), 
since those sample sizes are all greater than or equal to 30. 


8.13 a fu, =40, and o, = o/Jn = 5/64 = 0.625. Since n = 64 2 30, the distribution of ¥ will be 
approximately normal. 


b_ Since “-0.5=40-0.5=39.5 and w+0.5=40+0.5 = 40.5, the required probability is 
P(39.5< ¥ < 40.5) = P((39.5—40)/0.625 < z < (40.5 —40)/0.625) = P(-0.8< z< 0.8) 
= 0.7881 —0.2119 = 0.5762. 


e Since “-—0.7=40-0.7 =39.3 and w+0.7=40+0.7 = 40.7, the probability that x will be 
within 0.7 of is P(39.3< ¥ < 40.7) = P((39.3—40)/0.625 < z < (40.7—40)/0.625) 
= P(-1.12<z<1.12) =0.8686-—0.1314=0.7372. Therefore, the probability that x will be 
more than 0.7 from 4 is 1—0.7372 = 0.2628. 


8.15 a u.=2 and o, =0/Vn =0.8/V9 =0.267. 


b Ineach case “. =2. 
When n=20, 0; =a/Vn =0.8//20 = 0.179. 
When n=100, 0; =0/Vn =0.8/V100 =0.08. 
The centers of the distributions of the sample mean are all at the population mean, while the 


standard deviations (and therefore spreads) of these distributions are smaller for larger sample 
sizes. The sample size of n = 100 is most likely to result in a sample mean close to 4/, since 


this is the sample size that results in the smallest standard deviation of the distribution of ¥. 


8.17 a Since the distribution of interpupillary distances is normal, the distribution of ¥ is normal, 


also. 
2 64-65 67 —65 
P(64 < X < 67) = P| ——— < z< ———_|= P(-1< z< 2)=0.9772 -0.1587 = 0.8185. 
(5/J25) (3/25) 
P(X = 68)=P 7p ph abe. = P(z>3)=0.0013. 


(55) 
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b Since n=100 2 30, the distribution of ¥ is approximately normal. 


64-65 O7= 05 


_P(64<¥ <67)= P| << —=_ |= P(-2.< z< 4) = 1.0000 — 0.0228 = 0.9772. 
(5//i100) a (5//100) mess) 
a 68-65 
P(X > 68) = P| z>—-—==_ | = P(z > 6) = 0.0000. 
eat ear ea 


8.19 Given that the true process mean is 0.5, the probability that ¥ is nor in the shutdown range is 


P(0.49<%<0.51)= P| O09 - 251505 | _ py_4.<2 <3) = 0.9987 - 0.0013 = 0.9974. 
0.2//36 0.2//36 


So the probability that the manufacturing line will be shut down unnecessarily is 1—0.9974 
= 0.0026. 


60-50 


20//100 


ZV pH 
ia 10 
_ [(0.65)(0.35) _ 
oo. =. |60-85)0-35) _ 9 987 
P =a 30 . . 
0.65)(0.35 
dU, =0.65, 0, = ae = 0.067. 
Ps (Oo Oaa 
eee G0 
410 200 
Pe 100 


8.21  P(Total > 6000) = P(x > 60) = of: > = P(z>5)= 0.0000. 
8.23 a uU,=0.65, 


b Hs = 0.65, 


Cua 0.65) 


e f= 0.65, 


[(0.65)(0.35) _ ance 
[(0.07)0.93) _ 9 a6 


b No, since np =100(0.07) =7, which is not greater than or equal to 10. 


f LM, =90.65, 


8.25 a i» — 0.07, 
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The mean is unchanged since the mean of the distribution of the sample proportion is always 
equal to the population proportion, but the standard deviation changes to 
o, =, | C2293) _ 9 018, 
| 200 
Yes, since np = 200(0.07) = 14 and n(1— p) = 200(0.93) = 186, which are both greater than or 
equal to 10. 
feo ye AOS Wa Ae eee ak Dla) Oat = P(z > 1.66) = 0.0485. 
{(0.07)(0.93)/200 
8.27 HL; Fgh te 0.005, o, = (0:05 202 0.007. 
P 200 a 100 
No, since np = 100(0.005) = 0.5, which is not greater than or equal to 10. 
We need both np and n(1— p) to be greater than or equal to 10, and since p< q it will be 
sufficient to ensure that np >10. So we need n(0.005) > 10, that is n > 10/0.005 = 2000. 
8.29 If p=0.5, u, =0.5, 0; = aoe = 0.0333. Also np =225(0.5)=112.5210 and 
n(1— p)=225(0.5)=112.5210, and so p has an approximately normal distribution. 
If p=0.6, u, =0.6, 0; = wee = 0.0327. Also np = 225(0.6)=135210 and 
n(1— p)=225(0.4)=90>10, and so p has an approximately normal distribution. 
hie 4 Wie hicty al ae OGY hi Ppp As Ss = P(z 23)=0.0013. 
4{(0.5)(0.5)/225 
Ie 0.0,..P( 7 2.0.6) =P eal RT = P(z20)=0.5. 
/(0.6)(0.4)/225 
For a larger sample size, the value of p is likely to be closer to p. So, for n = 400, when 
p=0.5, P(p 20.6) will be smaller. When p=0.6, P(p = 0.6) will still be 0, that is, it will 
remain the same. 
8.31 P(Returned) = P(p > 0.02) = P| z> SN Chel a = P(z>-1.95)=0.9744. 
¥(0.05)(0.95)/200 
0.02—0.1 


P(Returned) = P(p > 0.02) = ic > | = P(z >-3.77) =0.9999. So the 


(0.1)(0.9)/200 


probability that the shipment is not returned is 1— 0.9999 = 0.0001. 
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8.33 a Since n=100230, ¥ is approximately normally distributed. Its mean is 50 and its standard 


deviation is v1/V100 =0.1. 


b  P(49.75< ¥ < 50.25) = p( Bras pe ness a) 


= P(-2.5<z< 2.5) =0.9938 — 0.0062 
0.1 0.1 


= 0.9876. 


c Since uw. =50, P(¥<50)=0.5. 


8.35 a Let the index of the specimen be x. P(850< x <1300)= pi Bane <r ae 
= P(-1<z<2)=0.9772 —0.1587 = 0.8185. 


b Po9s0-<¥-<1100)= 7{ 250 z< HOO | = m1.05<2<2.10 


150/10. 150/Ji0 


= 0.9826 — 0.1469 = 0.8357. 


850-1000 _ , _ 1300-1000 (310 <3 = 002) 
150//10 150/10 
= 1.0000 — 0.0008 = 0.9992. 


i(os0 =x = 1300) = 


106-100 


30//50 


8.37 P(Total > 5300) = P(¥ > 5300/50) = P(¥ > 106) = rf: > | = P(z >1.41) =0.0793. 
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Chapter 9 
Estimation Using a Single Sample 


Note: In this chapter, numerical answers to questions involving the normal and ¢ distributions were found 


using statistical tables. Students using calculators or computers will find that their answers differ slightly 
from those given. 


Statistics II and II are preferable to Statistic I since they are unbiased (their means are equal to 
the value of the population characteristic). However, Statistic II is preferable to Statistic III since 
its standard deviation is smaller. So Statistic II should be recommended. 


P=1120/6212/= 0.277. 


The value of p is estimated using p, and the value of p is 14/20=0.7. 


1 

o°3 

9.5 

a7 a 
b 
c 

ye) a 
b 
¢ 
d 

OD liewea 
b 
c 
d 
e 

URIS 5 
b 
c 


The value of yw is estimated using ¥ =(410+---+530)/7 = 421.429. 


The value of o° is estimated using s* = 10414.286. 


The value of o is estimated using s = 102.050. No, s is not an unbiased statistic for 
estimating o. 


The value of 4, is estimated using ¥ =(103+---+99)/10 =120.6 therms. 
The value of 7 is estimated to be 10000(120.6) = 1,206,000 therms. 

The value of p is estimated using p = 8/10=0.8. 

The population median is estimated using the sample median, which is 120 therms. 
1.96 

1.645 

2.58 

1.28 

1.44 

The larger the confidence level the wider the interval. 

The larger the sample size the narrower the interval. 


Values of p further from 0.5 give smaller values of p(1—p). Therefore, the further the 


value of p from 0.5, the narrower the interval. 


105 
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a ee 


9.15 Check of Conditions 
1. Since np =1100(990/1100) =990 210 and n(1— p)=1100(110/1100)=1102 10, the sample 
size is large enough. 

The sample size of n = 1100 is much smaller than 10% of the population size (the number of 

drivers). 

3. Weare told to assume that the sample is representative of the population of drivers. Having 
made this assumption it is reasonable to regard the sample as a random sample from the 
population. 

Calculation 

The 99% confidence interval for p is 


£2.58, /PURP) — 990 4.5 5g |(990/1100)(1 10/1100) _ (4 go7 9.993), 
n 1100 1100 


Interpretation 
We are 99% confident that the proportion of all drivers who have engaged in careless or 


aggressive driving in the last six months is between 0.877 and 0.923. 


N 


9.17 Let p be the proportion of all coastal residents who would evacuate. 

Check of Conditions 

1. Since np =5046(0.69) = 3482 >10 and n(1— p) =5046(0.31) =1564 210, the sample size is 
large enough. 

2. The sample size of n = 5046 is much smaller than 10% of the population size (the number of 
people who live within 20 miles of the coast in high hurricane risk counties of these eight 
southern states). 

3. The sample was selected in a way designed to produce a representative sample. So, it is 
reasonable to regard the sample as a random sample from the population. 

Calculation 

The 98% confidence interval for p is 


p+2.33, are) = 0.69 +2.33 Bee) = (0.675,0.705). 
n 5046 


Interpretation of the Confidence Interval 
We are 98% confident that the proportion of all coastal residents who would evacuate is between 


0.675 and 0.705. 

Interpretation of the Confidence Level 

If we were to take a large number of random samples of size 5046, 98% of the resulting 
confidence intervals would contain the true proportion of all coastal residents who would 
evacuate. 


9.19 a Check of Conditions 

1. Since np = 2002(1321/2002) = 1321210 and n(1— p) = 2002(68 1/2002) = 681>10, the 
sample size is large enough. 

2. The sample size of n = 2002 is much smaller than 10% of the population size (the number 
of Americans age 8 to 18). 

3. The sample was selected in a way designed to produce a representative sample. So, it is 
reasonable to regard the sample as a random sample from the population. 

Calculation 

The 90% confidence interval for p is 


F pU-p) — 1321 /(1321/2002)(68 1/2002) 
+1.645,/————+ = —— + 1.645, |-—_+~__—___"—_ = (0.642, 0. : 
: n 2002 2002 ( ates 
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Interpretation 


We are 90% confident that the proportion of all Americans age 8 to 18 who own a cell phone 
is between 0.642 and 0.677. 


b Check of Conditions 
1. Since np = 2002(1522/2002) = 1522 >10 and n(1— p)= 2002(480/2002) = 480 = 10, the 
sample size is large enough. 
2. The sample size of n = 2002 is much smaller than 10% of the population size (the number 
of Americans age 8 to 18). 
3. The sample was selected in a way designed to produce a representative sample. So it is 


reasonable to regard the sample as a random sample from the population. 
Calculation 


The 90% confidence interval for p is 


r pU=p) _ 1522 [(1522/2002)(480/2002) 
stl 58 fe ee G45, 00745076 
e n 2002 2002 ( BAe 


Interpretation 
We are 90% confident that the proportion of all Americans age 8 to 18 who own an MP3 
player is between 0.745 and 0.776. 


¢ The interval in Part (b) is narrower than the interval in Part (a) because the sample proportion 
in Part (b) is further from 0.5, thus reducing the value of the estimated standard deviation of 


the sample proportion (given by the expression ,/p(1— p)/7 ). 


9.21. a Check of Conditions 

1. Since np =500(350/500) =350210 and n(1— p) =500(150/500) = 150 210, the sample 
size is large enough. 

2. The sample size of m = 500 is much smaller than 10% of the population size (the number 
of potential jurors). 

3. Weare told to assume that the sample is representative of the population of potential 
jurors. Having made this assumption it is reasonable to regard the sample as a random 
sample from the population. 

Calculation 

The 95% confidence interval for p is 


ptl.96 jPU~P) _ 399 1 96 {(350/500)150/500) = (0.660,0.740). 
n 500 500 


Interpretation 
We are 95% confident that the proportion of all potential jurors who regularly watch at least 


one crime-scene investigation series is between 0.660 and 0.740. 
b Wider 


9.23. a Check of Conditions 
1. Since np =526(137/526) =137 210 and n(1— p) = 526(389/526) = 389 210, the sample 
size is large enough. 
2. The sample size of n = 526 is much smaller than 10% of the population size (the number 


of U.S. businesses). 
3. We must assume that the sample is a random sample of U.S. businesses. 
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Calculation 
The 95% confidence interval for p is 
p+1.96 PQ=P) _ 137 1 9 [133926 82/520) = (0.223,0.298). 
n 526 526 
Interpretation 
We are 95% confident that the proportion of all U.S. businesses that have fired workers for 
misuse of the Internet is between 0.223 and 0.298. 
b The sample proportion of businesses that had fired workers for misuse of email is further 
from 0.5 than the sample proportion of businesses that had fired workers for misuse of the 
Internet, making the value of ,/p(1— p)/n smaller. This makes the confidence interval 
narrower. Additionally, the critical value of z for a 90% confidence interval is smaller than 
the critical value of z for a 95% confidence interval, also making the second confidence 
interval narrower. 
9.25 Check of Conditions 
1. Since np =1002(0.82) =822 >10 and n(1— p) =1002(0.18) =180 210, the sample size is 
large enough. 
2. The sample size of n = 1002 is much smaller than 10% of the population size (the number of 
adults in the country). 
3. Weare told that the sample was a random sample from the population. 
Calculation 
When the sample proportion of 0.82 is used as an estimate of the population proportion, the 95% 
error bound on this estimate is 
A1— p 9 
1,96, |LUE=P) — 1.96 (Cazes, = 0,024. 
n 1002 
Interpretation 
We are 95% confident that proportion of all adults who believe that the shows are either “totally 
made up” or “mostly distorted” is within 2.4% of the sample proportion of 82%. 
9.27. Weare 95% confident that proportion of all adult drivers who would say that they often or 
sometimes talk on a cell phone while driving is within 1.96,/ p(1— p)/n 
= 1.96,/(0.36)(0.64)/1004 = 0.030, that is, 3.0 percentage points, of the sample proportion of 
36%. The reported bound on error is slightly inaccurate, in that it is wrong by one tenth of a 
percentage point. 
9.29 a Check of Conditions 


1. Since np = 89(18/89) =18210 and n(1— p) = 89(71/89) = 71> 10, the sample size is 
large enough. 

2. The sample size of n = 89 is much smaller than 10% of the population size (the number 
of people under 50 years old who use this type of defibrillator). 

3. Weare told to assume that the sample is representative of patients under 50 years old 
who receive this type of defibrillator. Having made this assumption it is reasonable to 
regard the sample as a random sample from the population. 

Calculation 

The 95% confidence interval for p is 
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b 


c 


t WES iok 
H+1.96 me = 5 £1.96 ESSN T8) = (0.119,0.286). 


Interpretation 
We are 95% confident that the proportion of all patients under 50 years old who experience a 


failure within the first two years after receiving this type of defibrillator is between 0.119 and 
0.286. 


Check of Conditions 

1. Since np =362(13/362)=13210 and n(1— p) = 362(349/362) = 349 = 10, the sample 
size is large enough. 

2. The sample size of n = 362 is much smaller than 10% of the population size (the number 
of people age 50 or older who use this type of defibrillator). 

3. Weare told to assume that the sample is representative of patients age 50 or older who 
receive this type of defibrillator. Having made this assumption it is reasonable to regard 
the sample as a random sample from the population. 

Calculation 

The 99% confidence interval for p is 


pt2.58 PQ- Pp) _ 13 145 {@.3/362)G-49/362) = (0.011,0.061). 
n 362 362 


Interpretation 
We are 99% confident that the proportion of all patients age 50 or older who experience a 


failure within the first two years after receiving this type of defibrillator is between 0.011 and 
0.061. 


Using the estimate of p from the study, 18/89, the required sample size is given by 


9) 2 
1.96 \ 18 \)( 71 \f 1.96 \ 

= p(1— pe a a ee ee OSG LOGO: 
et al a alan 


So a sample of size at least 689 is required. 


2 2 
931,07. n= pl -»)( =“)29 ta = 2401. A sample size of 2401 is required. 


B 


1.96)" 1.96 
9.33 n=pi -p ; ) = 025 77 = 384.16. A sample size of 385 is required. 


9.35 a 


B 


2.12 
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9.39 


9.41 


The width of the first interval is 52.7—51.3=1.4. The width of the second interval is 
50.6—49.4 =1.2. Since the confidence interval is given by ¥ +(¢ critical value)(s/Vn), the 
width of the confidence interval is given by 2-(¢ critical value)(s/ Jn ) . Therefore, for samples of 


equal standard deviations, the larger the sample size the narrower the interval. Thus it is the 
second interval that is based on the larger sample size. 


Conditions 

1. Since n=411>30,the sample size is large enough. 

2. Weare told to assume that the sample is representative of students taking introductory 
psychology at this university. Having made this assumption it is reasonable to regard the 
sample as a random sample from the population. 

Calculation 

The 95% confidence interval for “ is 


¥ +(t critical value): = 7.7441,96-20w = (7.411, 8.069). 


n V4il 
Interpretation 
We are 95% confident that the mean time spent studying for this exam for all students taking 
introductory psychology at this university is between 7.411 and 8.069 hours. 


Conditions 

1. Since n= 411 230,the sample size is large enough. 

2. Weare told to assume that the sample is representative of students taking introductory 
psychology at this university. Having made this assumption it is reasonable to regard the 
sample as a random sample from the population. 

Calculation 

The 90% confidence interval for “ is 


¥ +(t critical value)- =. = 43.18+1,645- 244° — (41.439, 44,921). 


n J4i1 
Interpretation 
We are 90% confident that the mean percent of study time that occurs in the 24 hours prior to 
the exam for all students taking introductory psychology at this university is between 41.439 
and 44.921. 


The fact that the mean is much greater than the median suggests that the distribution of times 
spent volunteering in the sample was positively skewed. 


With the sample mean being much greater than the sample median, and with the sample 
being regarded as representative of the population, it seems very likely that the population is 
strongly positively skewed, and therefore not normally distributed. 


Since n=1086 > 30, the sample size is large enough for us to use the ¢ confidence interval, 
even though the population distribution is not approximately normal. 


In addition to observing that the sample is large, we need to point out that the sample was 
selected in a way that makes it reasonable to regard it as representative of the population, and 
therefore that it is reasonable to regard the sample as random. This justifies use of the f 
confidence interval. The 98% confidence interval for “ is then 
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Le Spe 


qs Tioze = (5.232,5.968). 


-We are 98% confident that the mean time spent volunteering for the population of parents of 
school age children is between 5.232 and 5.968 hours. 


xX +(¢ critical value)- 


9.43. a The 90% confidence interval would be narrower. In order to be only 90% confident that the 
interval captures the true population mean, the interval does not have to be as wide as it 
would in order to be 95% confident of capturing the true population mean. 


b The statement is not correct. The population mean, //, is a constant, and therefore we cannot 
talk about the probability that it falls within a certain interval. 


¢ The statement is not correct. We can say that on average 95 out of every 100 samples will 
result in confidence intervals that will contain “2, but we cannot say that in 100 such samples, 
exactly 95 will result in confidence intervals that contain /Z. 


9.45 a For samples of equal sizes, those with greater variability will result in wider confidence 
intervals. The 12 to 23 month and 24 to 35 month samples resulted in confidence intervals of 
width 0.4, while the less than 12 month sample resulted in a confidence interval of width 0.2. 
So the 12 to 23 month and 24 to 35 month samples are the ones with the greater variability. 


b For samples of equal variability, those with greater sample sizes will result in narrower 
confidence intervals. Thus the less than 12 month sample is the one with the greater sample 
size. 


c Since the new interval is wider than the interval given in the question, the new interval must 
be for a higher confidence level. (By obtaining a wider interval, we have a greater confidence 
that the interval captures the true population mean.) Thus the new interval must have a 99% 
confidence level. 


9.47 a Conditions 
1. Since n=100 = 30, the sample size is large enough. 
2. Weare told to assume that the sample was a random sample of passengers. 
Calculation 
The ¢ critical value for 99 degrees of freedom (for a 95% confidence level) is between 1.98 
and 2.00. We will use an estimate of 1.99. Thus, the 95% confidence interval for // is 


20 
¥+(t critical value)-—— = 183 +1.99- — = (179.02, 186.98). 


Jn J100 
Interpretation 
We are 95% confident that the mean summer weight is between 179.02 and 186.98 Ib. 


b Conditions 
1. Since n =100 > 30, the sample size is large enough. 


2. Weare told to assume that the sample was a random sample. 


Calculation 
The 95% confidence interval for is 


23 
x +(¢ critical value): ae 190s 199: vio = (185.423, 194.577). 
n 
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Interpretation 
We are 95% confident that the mean winter weight is between 185.423 and 194.577 lb. 


c Based on the Frontier Airlines data, neither recommendation is likely to be an accurate 
estimate of the mean passenger weight, since 190 is not contained in the confidence interval 
for the mean summer weight and 95 is not contained in the confidence interval for the mean 
winter weight. 


9.49 A boxplot of the sample values is shown below. 


5.0 thee 10.0 Ibpze) 15.0 WS 
Fat Content (grams) 


The boxplot shows that the distribution of the sample values is negatively skewed, and this leads 
us to suspect that the population is not approximately normally distributed. Therefore, since the 
sample is small, it is not appropriate to use the ¢ confidence interval method of this section. 


9.51. A reasonable estimate of o is given by (sample range)/4 = (700 —50)/4 = 162.5. Thus 


; 2 2 
n -() (Aas) =1014.4225. 
B 10 


So we need a sample size of 1015. 


9.53. First, we need to know that the information is based on a random sample of middle-income 
consumers aged 65 and older. Second, it would be useful if some sort of margin of error were 
given for the estimated mean of $10,235. 


9.55 a The paper states that Queens flew for an average of 24.2 £9.21 minutes on their mating 
flights, and so this interval is a confidence interval for a population mean. 


b Conditions 

1. Since n=30 230, the sample size is large enough. 

2. Weare told to assume that the 30 queen honeybees are representative of the population of 
queen honeybees. It is then reasonable to treat the sample as a random sample from the 
population. 

Calculation 

The 95% confidence interval for w is 

ca op 24 ee jeceet eg 

Te 6+2.05 Fo (3.301, 5.899). 
Interpretation 

We are 95% confident that the mean number of partners is between 3.301 and 5.899. 


X +(f critical value)- 
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9.59 


9.61 


9.63 


Check of Conditions 


1. Since np = 52(18/52)=18210 and n(1— p) =52(34/52)=34 >10, the sample size is large 
enough. 

2. The sample size of n = 52 is much smaller than 10% of the population size (the number of 
young adults with pierced tongues). 

3. The assumption we have made is that the sample of 52 was a random sample from the 
population of young adults with pierced tongues. 

Calculation 

The 95% confidence interval for p is 


; A(1—p) 18 18 
Deooi ee == 21.96 jC?) = (0.217,0.475). 
ni 


Interpretation 
We are 95% confident that the proportion of all young people with pierced tongues who have 
receding gums is between 0.217 and 0.475. 


The standard error for the mean cost for Native Americans is much larger than that for Hispanics 
since the sample size was much smaller for Native Americans. 


Check of Conditions 

1. Since np =150(0.65) =97.5210 and n(1— p) =150(0.35) = 52.5210, the sample size is 
large enough. 

2. The sample size of 7 = 150 is much smaller than 10% of the population size (the number of 
Utah residents). 

3. Weare told to assume that the sample was a random sample from the population. 

Calculation 

The 90% confidence interval for p is 


BP PiGhs LPG oss 1645 ee = (0.586,0.714). 
1A} > 


Interpretation 
We are 90% confident that the proportion of all Utah residents who favor fluoridation is between 


0.586 and 0.714. 


Yes. Since the whole of this interval is above 0.5, the interval is consistent with the statement that 
fluoridation is favored by a clear majority of Utah residents. 


Check of Conditions 

1. Since np = 750(125/750) =125210 and n(1— p)= 750(625/750) = 625 210, the sample size 
is large enough. 

2. The sample size of n = 750 is much smaller than 10% of the population size (the number of 
full-time workers). 

3. Weare told to assume that the sample is a random sample from the population of full-time 
workers. 

Calculation 

The 90% confidence interval for p is 


pt1.645 POS Dies week Wa (CCU) = (0.144,0.189). 
n 750 750 
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Interpretation 
We are 90% confident that the proportion of all full-time workers who have been so angered in 


the last year that they wanted to hit a colleague is between 0.144 and 0.189. 


1.96 
0.1 


1.96 


al = (0.50.3) = 96.04. A sample size of 97 is required. 


9.65 n=pi(l -p) 


1.960 


9.67 n= 
B 


) = eee = 245.862. A sample size of 246 is required. 


9.69 The 99% upper confidence bound for the mean wait time for bypass surgery is 
19+2.33(10/./539 ) = 20.004 days. 


9.71 The 95% confidence interval for the population standard deviation of wait time (in days) for 
angiography is 


9+1 oo | = (8.571,9.429). 


ae 


9.73 Conditions 
1. We have to assume that the distribution of the time taken to eat a frog over all Indian false 
vampire bats is normally distributed. 
2. We have to assume, also, that the sample of 12 bats is a random sample from the population 
of Indian false vampire bats. 
Calculation 
The 90% confidence interval for “ is 


Xx +(¢ critical value)- —_ = 21.9+1.80- au 


Vn 12 
Interpretation 
We are 90% confident that the mean suppertime for a vampire bat whose meal consists of a frog 
is between 17.899 and 25.901 minutes. 


= (17.899, 25.901). 
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Chapter 10 
Hypothesis Testing Using a Single Sample 


Note: In this chapter, numerical answers to questions involving the normal and ¢ distributions were found 


using values from a calculator. Students using statistical tables will find that their answers differ slightly 
from those given. 


10.1 


10.3 


10.5 


10.7 


10.9 


10.11 


10.13 


Legitimate hypotheses concern population characteristics; ¥ is a sample statistic. 


Because so much is at stake at a nuclear power plant, the inspection team needs to obtain 
convincing evidence that everything is in order. To put this another way, the team needs not only 
to obtain a sample mean greater than 100 but, beyond that, to be sure that sample mean is 
sufficiently far above 100 to provide convincing evidence that the true mean weld strength is 
greater than 100. Hence an alternative hypothesis of H,: 4>100 will be used. 


We are clearly talking here about a situation where, in a sample of children who had received the 
MMR vaccine, a higher incidence of autism was observed than the incidence of autism in 
children in general. The process of the hypothesis test is then to assume that the incidence of 
autism is the same amongst the population of children who have had the MMR vaccine as it is 
amongst children in general, and then to find out whether, on that basis, a result such as the one 
obtained in the sample would be very unusual, or not particularly unusual. If such a result would 
be very unusual, then the sample result is providing convincing evidence of a higher incidence of 
autism amongst the population of children who have received the MMR vaccine than in children 
in general. If the sample result would not be particularly unusual, then it would not provide 
convincing evidence of this. However, since the incidence of autism amongst children in the 
sample was observed to be higher than it is known to be in children in general, there’s no way 
that this result can provide evidence that MMR does not cause autism. 


We assume that the program director will continue with the station’s current programming unless 
there is convincing evidence that more that half of the potential viewers prefer a return to the 
regular programming. Thus, letting p be the proportion of all potential viewers who would prefer 
a return to the regular programming, the program director should test Ho: p =0.5 versus Hi: 


jee: 


Let p be the proportion of all constituents who favor spending money for the new sewer system. 
She should test Hp: p=0.5 versus H,: p> 0.5. 


Let “ be the population mean amperage at which the fuses burn out. Action will need to be taken 
if the data provide convincing evidence that either “7< 40 or “> 40. Thus the manufacturer 
should test Hp: 44=40 versus H,: “#40. 


a This is a Type I error. Its probability is 3/33 = 0.091. 
b A Type II error would be coming to the conclusion that the woman has cancer in the other 


breast when in fact she does not have cancer in the other breast. The probability that this 
happens is 91/936 = 0.097. 


Hee) 
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10.15 a A Type | error would be coming to the conclusion that the man is not the father when in fact 
he is. A Type II error would be not coming to the conclusion that the man is not the father 
when in fact he is not the father. 


b a=0.001, B=0. 


ec <A “false positive” is coming to the conclusion that the man is the father when in fact he is not 
the father. This is a Type II error, and its probability is 2 = 0.008. 


10.17. a A Type error is obtaining convincing evidence that more than 1% of a shipment is defective 
when in fact (at least) 1% of the shipment is defective. A Type II error is not obtaining 
convincing evidence that more than 1% of a shipment is defective when in fact more than 1% 
of the shipment is defective. 


b The consequence of a Type I error would be that the calculator manufacturer returns a 
shipment when in fact it was acceptable. This will do minimal harm to the calculator 
manufacturer’s business. However, the consequence of a Type II error would be that the 
calculator manufacturer would go ahead and use in the calculators circuits that are defective. 
This will then lead to faulty calculators and would therefore be harmful to the manufacturer’s 
business. A Type II error would be the more serious for the calculator manufacturer. 


ce At least in the short term, a Type II error would not be harmful to the supplier’s business; 
payment would be received for a shipment that was in fact faulty. However, if a Type I error 
were to occur, the supplier would receive back, and not be paid for, a shipment of circuits that 
was in fact acceptable. A Type I error would be the more serious for the supplier. 


10.19 a Before filing charges of false advertising against the company, the consumer advocacy group 
would require convincing evidence that more than 10% of the flares are defective. 


b_ A Type I error is coming to the conclusion that more than 10% of the flares are defective 
when in fact 10% (or fewer) of the flares are defective. This would result in the expensive 
and time-consuming process of filing charges of false advertising against the company when 
in fact the company is not at fault. A Type II error is not coming to the conclusion that more 
than 10% of the flares are defective when in fact more than 10% of the flares are defective. 
As a result the consumer advocacy group would not file charges when in fact the company 
was at fault. 


10.21. a _ The researchers failed to reject Hp. 


b_ If the researchers were incorrect in their conclusion, then they would be failing to reject Ho 
when Hp was in fact true. This is a Type II error. 


e Yes. The study did not provide convincing evidence that there is a higher cancer death rate 


for people who live close to nuclear facilities. However, this does not mean that there was no 
such effect, and this would be the case for any study with the same outcome. 
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10.23 a A P-value of 0.0003 means that it is very unlikely (probability = 0.0003), assuming that Hp is 


true, that you would get a sample result at least as inconsistent with Hp as the one obtained in 
the study. Thus A) is rejected. 


b A P-value of 0.350 means that it is not particularly unlikely (probability = 0.350), assuming 
that A is true, that you would get a sample result at least as inconsistent with Hp as the one 
obtained in the study. Thus there is no reason to reject Hp. 

10.25. a Ah is not rejected. 

b Ab is not rejected. 

c Hb is not rejected. 

d AX is rejected. 

e is not rejected. 

f Hb isnot rejected. 


10.27 a _ The large-sample z test is not appropriate since np = 25(0.2)=5<10. 


b_ The large-sample z test is appropriate since np = 210(0.6)=126210 and 
n(l— p)=210(0.4) = 84 2 10. 


c The large-sample z test is appropriate since np = 100(0.9)=90 = 10 and 
n(1— p)=100(0.1) =10 210. 


d_ The large-sample z test is not appropriate since mp = 75(0.05) =3.75< 10. 


10.29 a 1. p=proportion of all women who work full time, age 22 to 35, who would be willing to 
give up some personal time in order to make more money. 


2 ehoep 30:5 

BL Take oP AU) 

4. a=0.01 

5 D> Pox BD 


z= caged a ba ace oe 
pU-p) [(0.50.5) 
n 1000 
6. The sample was selected in a way that was designed to produce a sample that was 
representative of women in the targeted group, so it is reasonable to treat the sample as a 
random sample from the population. The sample size is much smaller than the population 


size (the number of women age 22 to 35 who work full time). Furthermore, 
np = 1000(0.5) = 500 210 and n(1— p)=1000(0.5) = 500 2 10, so the sample is large 


enough. Therefore the large sample test is appropriate. 
z= 540/1000—0.5 = 2.52982 


[(0.5)(0.5) 
1000 


Sa P-values P(Z > 2.52982)=0.003 71 
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9. Since P-value = 0.00571 < 0.01 we reject Hy. We have convincing evidence that a 
majority of women age 22 to 35 who work full time would be willing to give up some 
personal time for more money. 


b_ No. The survey only covered women age 22 to 35. 


10.31. 1. p=proportion of all adult Americans who would prefer to live in a hot climate rather than a 


cold climate 
2. AM: p=0.5 
Deal area) 
A ea 01 
5 = a DE Peake a Das Os 


p(l—p) /(0.5)(0.5) 

n 2260 
6. The sample was nationally representative, so it is reasonable to treat the sample as a random 
sample from the population. The sample size is much smaller than the population size (the 
number of adult Americans). Furthermore, np = 2260(0.5) =1130210 and 
n(1— p) = 2260(0.5) = 1130 >10, so the sample is large enough. Therefore the large sample 
test is appropriate. 
ae 1288/2260-0.5 _ 


[(0.5)(0.5) 
2260 
8. P-value = P(Z > 6.64711) =0 


Since P-value =~ 0< 0.01 we reject Ho. We have convincing evidence that a majority of adult 
Americans would prefer a hot climate over a cold climate. 


6.64711 


10.33. 1. p=proportion of all American adults who oppose reinstatement of the draft 
2. Hy: p= 2/3 
Se Wg Beat ea) 
4. a@=0.05 
5. Wy ee PZ Powe Par 2/8 


p(l-p) [(2/3)c1/3) 
n 1000 
6. The sample was a random sample from the population. The sample size is much smaller than 
the population size (the number of American adults). Furthermore, np = 1000(2/3) = 667 > 10 


and n(1— p) =1000(1/3) = 333 = 10, so the sample is large enough. Therefore the large 
sample test is appropriate. 


_ 700/1000—2/3 _ 5 »3¢97 
(2/3)(1/3) 
V 1000 
8. P-value = P(Z > 2.23607) = 0.01267 


Since P-value = 0.01267 < 0.05 we reject Hy. We have convincing evidence that more than 
two-thirds of American adults oppose reinstatement of the draft. 


10.35 1. p= proportion of all cell phone users in 2004 who had received commercial messages or ads 


Pando p= .13 
Se ts OAS 
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10.37 ohh 


10.39 a 


ee en 


=005 
een pala 


— 
pU-p) [(o. 13)(0.87) 
n 5500 
The sample size is much smaller than the population size (the number of cell phone users in 
2004). Furthermore, np = 5500(0.13)= 715210 and n(1— p) = 5500(0.87) = 4785 > 10, so 
the sample is large enough. Therefore, if we assume that the sample was a random sample 


from the population, the large sample test is appropriate. 
0.2-0. 
URE 15.436 


*T(0.1300.87) 
\ 5500 


P-yalue = P(Z > 15.436) =0 


Since P-value =~ 0< 0.05 we reject Hy. We have convincing evidence that the proportion of 
cell phone users in 2004 who had received commercial messages or ads is more than 0.13. 


P = proportion of all adult Americans who believe that playing the lottery would be the best 
way of accumulating $200,000 in net wealth 


p-p) — |(0.2)(0.8) 
n 1000 
We are told to assume that the sample was a random sample from the population. The sample 
size is much smaller than the population size (the number of adult Americans). Furthermore, 
np =1000(0.2) = 200 =10 and n(1— p) =1000(0.8) = 800 = 10, so the sample is large 
enough. Therefore the large sample test is appropriate. 
4 210/1000-—0.2 = 0.79057 


/(0.2)(0.8) 
1000 
P-value = P(Z > 0.79057) = 0.21460 
Since P-value = 0.21460 > 0.05 we do not reject Hy. We do not have convincing evidence 


that more than 20% of adult Americans believe that playing the lottery would be the best 
strategy for accumulating $200,000 in net wealth. 


— 


p = proportion of all adult Americans who believe that the quality of movies being 
produced is getting worse 


Zee ris p= 0.5 

eibod SP G7E ADS 

4, a@=0.05 

é Pan p= pare wpads 


[pd=p) —|(0.5\(0.5) 
n 1000 
6. The sample was a random sample from the population. The sample size is much smaller 
than the population size (the number of adult Americans). Furthermore, 
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1. 25 


np = 1000(0.5) = 500 >10 and n(1— p) =1000(0.5) = 500 2 10, so the sample is large 
enough. Therefore the large sample test is appropriate. 
470/1000—0.5 — 1.39737 
[(0.5)(0.5) 
1000 
8. P-value = P(Z < —1.89737) = 0.02889 


Since P-value = 0.02889 < 0.05 we reject Hp. We have convincing evidence that fewer 
than half of adult Americans believe that movie quality is decreasing. 


The conditions for performing the test would all still be satisfied. The test statistic would now 


be 
47/100—0.5 Sone 


(0.50.5) 
100 
which gives P-value = P(Z < —0.6) = 0.274 > 0.05. So in this case, no, we do not have 


convincing evidence that fewer than half of adult Americans believe that movie quality is 
decreasing. 


re 
“ 


Both results suggest that fewer than half of adult Americans believe that movie quality is 
getting worse. However, getting 470 out of 1000 people responding this way (as opposed to 
47 out of 100) provides much stronger evidence of this fact. 


10.41 The “38%” value given in the article is a proportion of a// felons; in other words, it is a 
population proportion. Therefore we know that the population proportion is less than 0.4, and 
there is no need for a hypothesis test. 


10.43 a 


10.45 a 


P-value =2- P(t, > 0.73) = 0.484. 
P-value = P(t,. > —0.5) = 0.686. 
P-value = P(t,, < —2.1) = 0.025. 
P-value = P(t, <—5.1) = 0.000. 
P-value = 2: P(t) > 1.7) = 0.097. 


P-value = P(t,, < —2.3) =0.017< 0.05. Hp is rejected. We have convincing evidence that the 
mean writing time for all pens of this type is less than 10 hours. 


P-value = P(t,, <-1.83) = 0.042 > 0.01. Hp is not rejected. We do not have convincing 
evidence that the mean writing time for all pens of this type is less than 10 hours. 


Since f is positive, the sample mean must have been greater than 10. Therefore, we certainly 
do not have convincing evidence that the mean writing time for all pens of this type is less 
than 10 hours. A is certainly not rejected. 
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# = mean heart rate after 15 minutes of Wii Bowling for all boys age 10 to 13 

Ao: = 98 

Ay: L#98 

a=0.01 

sln  s/Jn 

We are told to assume that it is reasonable to regard the sample of boys as representative 
of boys age 10 to 13. Under this assumption, it is reasonable to treat the sample as a 
random sample from the population. We are also told to assume that the distribution of 


heart rates after 15 minutes of Wii Bowling is approximately normal. So we can proceed 
with the ¢ test. 


e008 
Beis 
P-value = 2: P(t,; > 0.74833) = 0.468 

Since P-value = 0.468 > 0.01 we do not reject Ho. We do not have convincing evidence 


that the mean heart rate after 15 minutes of Wii Bowling is not equal to 98 beats per 
minute. 


SS 


t = 0.74833 


{4 = mean heart rate after 15 minutes of Wii Bowling for all boys age 10 to 13 
Ah: [L= 66 
Hi, [L> 66 
a=0.01 
ae Rigby ks 66 
ea, 
We are told to assume that it is reasonable to regard the sample of boys as representative 
of boys age 10 to 13. Under this assumption, it is reasonable to treat the sample as a 
random sample from the population. We are also told to assume that the distribution of 
heart rates after 15 minutes of Wii Bowling is approximately normal. So we can proceed 
with the ¢ test. 
t= hot = 8.731 
15/14 
P-value = P(t,; > 8.731) =0 
Since P-value ~ 0< 0.01 we reject Hy. We have convincing evidence that the mean heart 
rate after 15 minutes of Wii Bowling is greater than 66 beats per minute. 


It is known that treadmill walking raises the heart rate over the resting heart rate, and the 
study provided convincing evidence that Wii Bowling does so, also. Although the sample 
mean heart rate for Wii Bowling was higher than the known population mean heart rate for 
treadmill walking, the study did not provide convincing evidence of a difference of the 
population mean heart rate for Wii Bowling from the known population mean for the 
treadmill. 


= mean salary offering for accounting graduates at this university 
Ao: “= 48722 

Aa: 6 > 48722 

a@=0.05 
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= ir ee Ole 
sl/vn —s/Nn 
6. The sample was a random sample from the population. Also, 7 =50 2 30. So we can proceed 


with the ¢ test. 
_ 49850 — 48722 
~ 3300/V50 
8. P-value = P(t, > 2.41702) = 0.010 
Since P-value = 0.010 < 0.05 we reject Hp. We have convincing evidence that the mean 


salary offer for accounting graduates of this university is higher than the 2010 national 
average of $48,722. 


= 2.41702 


10.51. 1. s =mean number of credit cards carried by undergraduates 
2. Ho: £=4.09 
Be 1 409 
4. a=0.05 
5 ere x —4.09 


s/vn s/n 

6. The sample was a random sample from the population. Also, 7 = 132 230. So we can 

proceed with the / test. 

fe Dat N2 et 866 
1.2//132 

8. P-value = P(t, <—14.266) =0 

9. Since P-value =0< 0.05 we reject Hy. We have convincing evidence that the mean number 
of credit cards carried by undergraduates is less than the credit bureau’s figure of 4.09. 


10.53. 1. 4 =mean minimum purchase amount for which Canadians consider it acceptable to use a 


debit card 
2 tye t= 10 
Sh eT) 
4. a=0.01 
5 aot ee etl 


"spin sf 
6. The sample was a random sample from the population. Also, n = 2000 > 30. So we can 
proceed with the ¢ test. 

ee iat.) 


t = ———_—_—_—_—_ = 
7,6/ (2000 

8. P-value = P(t, < —5.001) = 0 

Since P-value ~0< 0.01 we reject Hj. We have convincing evidence that the mean minimum 


purchase amount for which Canadians consider it acceptable to use a debit card is less than 
$10. 


=9.001 


10.55 a 1. s¢ =mean weekly time spent using the Internet by Canadians 
Be Lo i s5 
Sade Fras 1g 8) la 
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7 Ne 12S 

slvn s/n 
6. The sample was a random sample from the population. Also, n = 1000 > 30. So we can 
proceed with the ¢ test. 


12.7—12.5 
= ——==—— = 1.26491 
5/./1000 
8. P-value= P(t, > 1.26491) = 0.103 
Since P-value = 0.103 > 0.05 we do not reject Hy. We do not have convincing evidence 


that the mean weekly time spent using the Internet by Canadians is greater than 12.5 
hours. 


Pi jeies 


~ 2/J1000 


P-value = 0.001 < 0.05 we reject Ho. We have convincing evidence that the mean weekly 
time spent using the Internet by Canadians is greater than 12.5 hours. 


Now t¢ = 3.16228, which gives P-value = P(to9) > 3.16228) = 0.001. Since 


The sample standard deviation of 2 in Part (b) means that the population of weekly Internet 
times has a standard deviation of around 2. Likewise, the sample standard deviation of 5 in 
Part (a) means that the population of weekly Internet times has a standard deviation of around 
5. Assuming that the population of weekly Internet times has a mean of 12.5, it is far less 
likely to get a sample mean of 12.7 if the population standard deviation is 2 than if the 
population standard deviation is 5, since greater deviations from the mean are expected when 
the population standard deviation is larger. This explains why A is rejected when the sample 
standard deviation is 2, but not when the sample standard deviation is 5. 


Yes. Since the pattern in the normal probability plot is roughly linear, and since the sample 
was a random sample from the population, the / test is appropriate. 


The boxplot shows a median of around 245, and since the distribution is roughly symmetrical 
distribution, this tells us that the sample mean is around 245, also. This might initially suggest 
that the population mean differs from 240. But when you consider the fact that the sample is 
relatively small, and that the sample values range all the way from 225 to 265, you realize 
that such a sample mean would still be feasible if the population mean were 240. 


1. s =mean calorie content for frozen dinners of this type 
2. Ho: “=240 

3. Aa /LA240 

4, a@=0.05 

5. x-u x-—240 


Mi ms 

6. As explained in Part (a), the conditions for performing the ¢ test are met. 

7. The mean and standard deviation of the sample values are 244.33333 and 12.38278, 
244°33333—240 _ 1.21226. 


respectively. So ¢ =——————__ = 
; ‘ 12.38278/J12 
8. P-value =2- P(t, > 1.21226) =0.251 
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9. Since P-value =0.251> 0.05 we do not reject Ho. We do not have convincing evidence 
that the mean calorie content for frozen dinners of this type differs from 240. 


10.59 a _ Increasing the sample size increases the power. 
b Increasing the significance level increases the power. 
10.61 a  @=area under standard normal curve to the left of —1.28= 0.1. 


b When z=—-1.28, ¥ =10+(-—1.28)(.1) = 9.872. So Ah is rejected for values of x < 9.872. 
If 4=9.8, then ¥ is normally distributed with mean 9.8 and standard deviation 0.1. So 
P(H, is rejected) = P(X $ 9.872) 

= area under standard normal curve to left of (9.872 —9.8)/0.1 
= area under standard normal curve to left of 0.72 


= 0.7642. 
So B= P(A, not rejected) = 1— 0.7642 = 0.2358. 


ce states that ~=10 and H, states that “4 <10. Since 9.5 is further from 10 (in the direction 
indicated by H,), 2 is less for “4=9.5 than for “=9.8. 
Fors >9:5, 
P(H, is rejected) = P(x < 9.872) 
= area under standard normal curve to left of (9.872 —9.5)/0.1 
= area under standard normal curve to left of 3.72 


= 0.9999. 
So £= P(H, not rejected) = 1— 0.9999 = 0.0001. 


d Power when “=9.8 is 1—0.2358 = 0.7642. 
Power when “=9.5 is 1—0.0001 = 0.9999. 


lDesatae ote 0.0372 —0.035 
0.0125/V7 
not rejected, and we do not have convincing evidence that 7 > 0.035. 


= 0.46565. So P-value = P(t, > 0.46565) = 0.329 > 0.05. Therefore, Hp is 


_ |0.04 —0.035| 


0.0125 
of freedom, we get 2 =~ 0.75. 


= 0.4. Using Appendix Table 5, for a one-tailed test, @ = 0.05, 6 degrees 


c Power ~1—0.75=0.25. 


10.65 Using Appendix Table 5: 


0.52 -0.5| 

4 i — a elliot By) =e : 

0.02 Pee 

ay, [0.48—0.5| _ a 
0.02 
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_ [052-05] _, athe 
= Fegon mentee ee 
|0.54 0.5 
d 19 cm ES SS = 
Cooma eet > 


|0.54-0.5| 
d='——_"l=1 8 = 0.04. 
e ay, 1, B=0.04 


_ |0.54-0.5] 
= 9004823 © 


f. ig 1, B~0.01. 


g Comparing Part (b) with Part (a), it makes sense that true values of equal distances to the 


right and left of the hypothesized value will give equal probabilities of a Type II error. 
Comparing Part (c) with Part (a), it makes sense that a smaller significance level will give a 
larger probability of a Type II error (since with a smaller significance level you are less likely 
to reject Ho, and therefore more likely to fail to reject Hp). 

Comparing Part (d) with Part (a), it makes sense that an alternative value of further from 
the hypothesized value will give a smaller probability of a Type II error (since the test is more 
likely to correctly detect a true value of wz that is further from the hypothesized value of sz). 
Comparing Part (e) with Part (a), it makes sense that an alternative value of “2 twice as far 
from the hypothesized value combined with a population standard deviation that is twice as 
large will give the same probability of a Type II error. 

Comparing Part (f) with Part (e), it makes sense that a larger sample size will give a smaller 
probability of a Type I error (since a larger sample is more likely to detect that the true value 
of £2 is not equal to the hypothesized value of /). 


10.67 a 1. p=proportion of all women who would like to choose a baby’s sex who would choose a 


girl. 
2a Ligg D0 
See De OL 
4, @=0.05 
5 ee ee 


"Tpd=p) (0.50.5) 
n 229 


6. Weneed to assume that the sample was a random sample from the population of women 
who would like to choose the sex of a baby. The sample size is presumably much smaller 
than the population size (the number of women who would like to choose the sex of a 
baby). Also, np = 229(0.5) =114.5210 and n(1— p) = 229(0.5) = 114.5 2 10, so the 
sample is large enough. Therefore the large sample test is appropriate. 
a 140/229 —0.5 ~ 3370 
(0.5)(0.5) 
220 
8. P-value = P(Z > 3.370) = 0.0004 
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9. Since P-value = 0.0004 < 0.05 we reject Hy. We have convincing evidence that the 
proportion of all women who would like to choose a baby’s sex who would choose a girl 
is not equal to 0.5. (This contradicts the statement in the article.) 


b The survey was conducted only on women who had visited the Center for Reproductive 
Medicine at Brigham and Women’s Hospital. It is quite possible that women who choose this 
institution have views on the matter that are different from those of women in general. (This 
is selection bias.) Also, with only 561 of the 1385 women responding, it is quite possible that 
the majority who did not respond had different views from those who did. (This is 
nonresponse bias.) For these two reasons it seems unreasonable to generalize the results to a 
larger population. 


10.69 1. p=proportion of all U.S. adults who believe that rudeness is a worsening problem 
2, Ho: p=0-15 
Sreiigs P 0H 
4, a@=0.05 
Eelaieo8 P-pP te p-9.75 
p(l- p) [(0.75)(0.25) 
n 2013 
6. We need to assume that the sample was a random sample from the population of U.S. adults. 
The sample size is much smaller than the population size (the number of U.S. adults). 
Furthermore, np = 2013(0.75) = 1509.75 = 10 and n(1— p) = 2013(0.25) = 503.25 = 10, so the 
sample is large enough. Therefore the large sample test is appropriate. 
7 7 — 1283/2013-0.75 _ 54 67] 
[(0.75)(0.25) 
2013 
8. P-value = P(Z <—11.671)=0 
9. Since P-value =~ 0< 0.05 we reject Hp. We have convincing evidence that less than three- 
quarters of all U.S. adults believe that rudeness is a worsening problem. 
10.71 1. £ =mean time to distraction for Australian teenage boys 
2 Ae: = 5 
ch Wek s lice) 
4. a=0.01 
ihe = 5 ed ee ream 


slVn s/n 

6. Weare told to assume that the sample was a random sample from the population. Also, 

n=50 230. So we can proceed with the ¢ test. 

pe tea 
1.4//50 

8. P-value = P(t, <—5.051) =0 

9. Since P-value =~ 0< 0.01 we reject Hy. We have convincing evidence that the mean time to 
distraction for Australian teenage boys is less than 5 minutes. 


=o Uo! 


10.73 1. p=proportion of all U.S. adults who approve of casino gambling 
2. My: p=2/3 
2. A pe us 
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10.75 1. 


10.77 a 


Pa Eee Ppeae oe 


at = 0.05 
eve 


= 2 ale 
[pa =p) [(2/3)G/3) 
n 1523 
We need to assume that the sample selected at random from households with telephones was 


a random sample from the population of U.S. adults. The sample size is much smaller than 
the population size (the number of U.S. adults). Furthermore, np = 1523(2/3)=1015210 and 
n(1— p) =1523(1/3) = 508 = 10, so the sample is large enough. Therefore the large sample 


test is appropriate. 
och, 1035/1523 —2/3 


| (2/3)(/3) 
1523 
P-value = P(Z > 1.06902) = 0.143 


Since P-value = 0.143 > 0.05 we do not reject Hy. We do not have convincing evidence that 
more than two-thirds of all U.S. adults approve of casino gambling. 


= 1.06902 


P = proportion of all U.S. adults who believe that an investment of $25 per week over 40 
years with a 7% annual return would result in a sum of over $100,000 

Fh: p= 0.4 

A,: p< 0.4 

= 0:05 


P=p. .. p-0A 


pU-p) [(0.4)(0.6) 
n 1010 
The sample was random sample from the population of U.S. adults. The sample size is much 
smaller than the population size (the number of U.S. adults). Furthermore, 
np = 1010(0.4) = 404 = 10 and n(1— p) =1010(0.6) = 606 2 10, so the sample is large 
enough. Therefore the large sample test is appropriate. 


_ 374/1010-0.4 _ 


=—1.92688 
/(0.4)(0.6) 
1010 


P-value = P(Z < —1.92688) = 0.027 

Since P-value = 0.027 < 0.05 we reject Ho. We have convincing evidence that less than 40% 
of all U.S. adults believe that an investment of $25 per week over 40 years with a 7% annual 
return would result in a sum of over $100,000. 


Lo 


1. mg =mean weight for non-top-20 starters 
2. Ho: “~=105 
Bia 05 
4. a@=0.05 
¥-f x—-105 


5. t= = 
s/n s/Nn 
6. The sample was a random sample from the population. Also, m = 33 2 30. So we can 
proceed with the f test. 
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{= eA —0.59913 
16.3/V33 
8. P-valué=P(Z,, <—099913)=U0.277 
9. Since P-value = 0.277 > 0.01 we do not reject Hy. We do not have convincing evidence 
that the mean weight for non-top-20 starters is less than 105 kg. 
10.79 1. p=proportion of all people who would respond if the distributor is fitted with an eye patch 
2. Hop=0A4 
3. sig p = 04 
4, a@=0.05 
a ed 
p=p) — |(0.4)(0.6) 
n 200 
6. We have to assume that the sample was random sample from the population of people who 
could be approached with a questionnaire. The sample size is much smaller than the 
population size. Furthermore, np = 200(0.4) = 80210 and n(1— p) = 200(0.6) = 1202 10, so 
the sample is large enough. Therefore the large sample test is appropriate. 
7 , — 109/200-0.4 _ 1 196 
[(0.4)(0.6) 
200 
8. P-value = P(Z > 4.186) = 0 
9. Since P-value =~ 0< 0.05 we reject Hp. We have convincing evidence that more than 40% of 
all people who could be approached with a questionnaire will respond when the distributor 
wears an eye patch. 
10.81 1. sg =mean daily revenue since the change 
2s Ab: = 75 
Eee Pveed Meee 
4. a@=0.05 
a x-L a x—71) 
© s/n s/Nn 
6. The sample was a random sample of days. In order to proceed with the ¢ test we must assume 
that the distribution of daily revenues since the change is normal. 
7. t= atl lS —5.324 
4.2//20 
8. P-value = P(t, < —5.324) =0 
9. Since P-value ~0< 0.05 we reject Ho. We have convincing evidence that the mean daily 


revenue has decreased since the price increase. 
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Cumulative Review Exercises 


CR10.1 


Gather a set of volunteer older people with knee osteoarthritis (we will assume that 40 such 
volunteers are available). Have each person rate his/her knee pain on a scale of 1-10, where 10 is 
the worst pain. Randomly assign the volunteers to two groups, A and B, of equal sizes. (This can 
be done by writing the names of the volunteers onto slips of paper. Place the slips into a hat, and 
pick 20 at random. These 20 people will go into Group A, and the remaining 20 people will go 
into Group B.) The volunteers in Group A will attend twice weekly sessions of one hour of tai 
chi. The volunteers in Group B will simply continue with their lives as they usually would. After 
12 weeks, each volunteer should be asked to rate his/her pain on the same scale as before. The 


mean reduction in pain for Group A should then be compared to the mean reduction in pain for 
Group B. 


CR10.3 


0 14 28 42 56 70 84 
Number of flights delayed more than 3 hours 


There are three airlines that stand out from the rest by having large numbers of delayed 
flights. These airlines are ExpressJet, Delta, and Continental, with 93, 81, and 72 delayed 
flights, respectively. 


PR eee P inna Rane? $2 riot 2 tears Ort eeiig Ot Hirt ser 
0.0 0.7 1.4 21 2.8 3.5) 4.2 4.9 


Rate per 100,000 flights 


A typical number of flights delayed per 10,000 flights is around 1.1, with most rates lying 
between 0 and 1.6. There are four airlines that standout from the rest by having particularly 
high rates, with two of those four having particularly high rates. 


c The rate per 100,000 flights data should be used, since this measures the likelihood of any 


given flight being late. An airline could standout in the number of flights delayed data purely 
as a result of having a large number of flights. 


CR10.5 


a The number of people in the sample who change their passwords quarterly is binomially 
distributed with n =20 and p=0.25. So, using Appendix Table 9, p(3) = 0.134. 


b Using Appendix Table 9, P(more than 8 change passwords quarterly) 
= 0.027 + 0.010 + 0.003 + 0.001 = 0.041. 


© L, =np =100(0.25) =25, o, = Jnp(— p) = J100(0.25)(0.75) = 4.330. 
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d Since np =100(0.25) = 25210 and n(1— p)=100(0.75) = 75 2 10 the normal approximation 
to the binomial distribution can be used. Thus, 
19.5-—25 ) 


= P(z $—1.27017) = 0.102. 
4.33013 


Pox<20)~P[ =< 


CR10.7 
a P(O)=0.4. 


b Anyone who accepts a job offer must have received at least one job offer, so 
P(A) = P(O“ A) = P(A|O)P(O) = (0.45)(0.4) = 0.18. 


ec P(G)=0.26. 

d P(A|O)=0.45. 

e Since anyone who accepts a job offer must have received at least one job offer, P(O| A) =1. 
f P(ANO)=P(A)=0.18. 


CR10.9 
a Check of Conditions 

1. Since np =115(38/115)=38>10 and n(1— p) = 2002(77/2002) = 77 = 10, the sample 
size is large enough. 

2. The sample size of m = 115 is much smaller than 10% of the population size (the number 
of U.S. medical residents). 

3. Weare told to regard the sample as a random sample from the population. 

Calculation 

The 95% confidence interval for p is 


Aca eee == £1.96 (ene = (0.244,0.416). 
n 


Interpretation 
We are 95% confident that the proportion of all U.S. medical residents who work 


moonlighting jobs is between 0.244 and 0.416. 


b Check of Conditions 

1. Since np =115(22/115)=22 210 and n(1— p) =115(93/2002) = 93 > 10, the sample size 
is large enough. 

2. The sample size of nm = 115 is much smaller than 10% of the population size (the number 
of U.S. medical residents). 

3. Weare told to regard the sample as a random sample from the population. 

Calculation 

The 90% confidence interval for p is 


: TSA oo 22/115)(9 
EeOWA TNE ee Serratia ees) = (0.131,0.252). 
n 


Interpretation 
We are 90% confident that the proportion of all U.S. medical residents who have credit card 
debt of more than $3000 is between 0.131 and 0.252. 
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c¢ The interval in Part (a) is wider than the interval in Part (b) because the confidence level in 
Part (a) (95%) is greater than the confidence level in Part (b) (90%) and because the sample 
‘Proportion in Part (a) (38/115) is closer to 0.5 than the sample proportion in Part (b) (22/115). 
CR10.11 
A reasonable estimate of o is given by (sample range)/4 = (20.3-19.9)/4 = 0.1. Thus 


1.960) /(1.96-0.1) 
n =([ 120 (eer) Seis 
B 0.01 


So we need a sample size of 385. 


CR10.13 


1. p= proportion of all baseball fans who believe that the designated hitter rule should either be 
expanded to both baseball leagues or eliminated 


2 AepH0.5 

Sh Peleg ol AU 

Ay Oe 0.05 

pant ppl as ee 


pd=p) — |(0.5)(0.5) 
n 394 


6. The sample was a random sample from the population. The sample size is much smaller than 
the population size (the number of baseball fans). Furthermore, np = 394(0.5)=197210 and 
n(l— p) =394(0.5) =197 210, so the sample is large enough. Therefore the large sample test 
is appropriate. 

z= 272/394 — 0.5 27557. 


(0.50.5) 
304 
8. P-value = P(Z > 7.557) ~0 


Since P-value =~ 0< 0.05 we reject Hy. We have convincing evidence that a majority of 
baseball fans believe that the designated hitter rule should either be expanded to both baseball 
leagues or eliminated. 


CR10.15 
a With asample mean of 14.6, the sample standard deviation of 11.6 places zero just over one 
standard deviation below the mean. Since no teenager can spend a negative time online, to get 
a typical deviation from the mean of just over 1, there must be values that are substantially 
more than one standard deviation above the mean. This suggests that the distribution of 
online times in the sample is positively skewed. 


b 1. s =mean weekly time online for teenagers 
2. Ho: w=10 
30 et 
4. @=0.05 
w= [x — 10 


a 


t= ——_. = —— 

s/\n s/Nn 
6. The sample was a random sample of teenagers. Also, m = 534 2 30. Therefore we can 
proceed with the f test. 
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7. t= Beet yay 


mG A534 
8. P-value = P(t,;, > 9.164) ~ 0 
9. Since P-value ~ 0< 0.05 we reject Hy. We have convincing evidence that the mean 
weekly time online for teenagers is greater than 10 hours. 
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Chapter 11 
Comparing Two Populations or Treatments 


Note: In this chapter, numerical answers to questions involving the normal and ¢ distributions were found 


using values from a calculator. Students using statistical tables will find that their answers differ slightly 
from those given. 


11.1 Since n, and ny, are large, the distribution of X, —X, is approximately normal. Its mean is 


2 2 [5232 
HM, — Hp =30-25=5 and its standard deviation is ame 22: = feefaa. = 0,529. 
eli 40 50 


11.3. a Weneed to assume that the 22 heart attack patients who were dog owners formed a random 
sample from the set of all heart attack patients who are dog owners and that the 80 heart 
attack patients who did not own a dog formed an independent random sample from the set of 
all heart attack patients who do not own a dog. Also, since the sample of size 22 is not large, 
we need to assume that the distribution of the HRVs of all heart attack patients who are dog 
Owners is normal. 


b 1. “4 =mean HRV for all heart attack patients who are dog owners 
{4 = mean HRV for all heart attack patients who do not own a dog 


2. Ho: { — fb =0 

3. Aa: {hy — fy #9 

4-005 

5. pa SX) — (hypothesized value) _ (4 —%))—9 
cant ee 
m Ny a 


6. As stated in Part (a), we need to assume that the 22 heart attack patients who were dog 
owners formed a random sample from the set of all heart attack patients who are dog 
owners and that the 80 heart attack patients who did not own a dog formed an 
independent random sample from the set of all heart attack patients who do not own a 
dog, and that the distributions of the HRVs of all heart attack patients who are dog 
owners and of all heart attack patients who do not own dogs are normal. 


873 — 800 


Tei ee eee 5 
(i36nms4 
—— + 
pen 80) 

Se) = 338.083 


P-value = 2- P(t;3 9g, > 2.23672) = 0.032 

9. Since P-value = 0.032 <0.05 we reject Ho. We have convincing evidence that the mean 
HRV for all heart attack patients who are dog owners is not equal to the mean HRV for 
all heart attack patients who do not own a dog. This conclusion is consistent with that of 


the paper. 


133 
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4 5 6 7 8 9 10 
Time per day using electronic media 


We need to assume that the population distributions of time per day using electronic media 
are normal. Since the boxplots are roughly symmetrical and since there is no outlier in either 
sample this assumption is justified, and it is therefore reasonable to carry out a two-sample ¢ 


test. 


ae Sot 


4, = mean time using electronic media for all kids age 8 to 18 in 2009 
i, = mean time using electronic media for all kids age 8 to 18 in 1999 


Ay: f, — H, =9 
Ay: fy — [b >9 
a=0.01 
_ (% —X,)—(hypothesized value) (x, —x,)-0 
Sty Sty 
mh m 


We are told to assume that it is reasonable to regard the two samples as representative of 
kids age 8 to 18 in each of the two years when the surveys were conducted. We can then 
treat the samples as random samples from their respective populations. Also, as discussed 
in Part (a), the boxplots show that it is reasonable to assume that the population 
distributions are normal. So we can proceed with a two-sample f test. 

¥,=7.6 s,=1.595 xX, =5.933 ss, =1.100 


+ ay ae — =3,332 
LESOSs oe PT OOS 
+ 
15 15 
df = 24.861 


P-value = P(t, 56, > 3.332) = 0.001 
Since P-value = 0.001 < 0.01 we reject Ho. We have convincing evidence that the mean 
number of hours per day spent using electronic media was greater in 2009 than in 1999, 


As explained in Parts (a) and (b), the conditions for the two-sample ¢ test or interval are 
satisfied. A 98% confidence interval for “4, —, is 
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7 


ae Be 
(x; —X,)+(¢ critical value) ois as 


1.5957 1,100? 


= (7.6 — 5.93333) + 2.48605 ef 
13 


= (0.423,2.910) 


We are 98% confident that the difference between the mean number of hours per day spent 
using electronic media in 2009 and 1999 is between 0.423 and 2.910. 


#4, = mean food intake for the 4-hour sleep treatment 
#4, = mean food intake for the 8-hour sleep treatment 


Ao: Ll, — {ly =0 

Fly: [L, — fl, #0 

‘og UO) 

a (x, — Xx, )— (hypothesized value) ie (Kx 0 
i Ib ed ae 


ee ee 8 ee ee a a 
2000 3000 4000 5000 6000 
Food intake 


The experimental subjects were randomly assigned to the two sleep conditions. Also, since 
the two boxplots are roughly symmetrical and there was no outlier in either group we are 
justified in assuming that the food intake distributions for the two treatments are normal. 
Thus, we can proceed with the two-sample f test. 


X, =3924 s, =829.668 xX, =4069.267 s, =952.896 
ge 24 ANON 26 be 0.445 
829.668" 7 952.8967 
i 15 
df = 27.480 
P-value = 2 - P(t) 4g) < —0.445) = 0.660 
Since P-value = 0.660 > 0.05 we do not reject Hy. We do not have convincing evidence of a 


difference in the means for the two sleep treatments. 


If the vertebroplasty group had been compared to a group of patients who did not receive any 
treatment, and if, for example, the people in the vertebroplasty group experienced a greater 
pain reduction on average than the people in the “no treatment” group, then it would be 
impossible to tell whether the observed pain reduction in the vertebroplasty group was caused 
by the treatment or merely by the subjects’ knowledge that some treatment was being applied. 
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By using a placebo group it is ensured that the subjects in both groups have the knowledge of 
some “treatment,” so that any differences between the pain reduction in the two groups can 
be attributed to the nature of the vertebroplasty treatment. 


b Check of Conditions 
Since n, = 68 >30 and n, =63 > 30, if we assume that the subjects were randomly assigned 
to the treatments, we can proceed with construction of a two-sample ¢ interval. 
Calculation 
df = 127.402. The 95% confidence interval for 4, — 44 is 


9} 9 

ees “ye Spas 
(X, —X,)+(¢ critical value), | + 
\ Hie ais 


OR. 07 
+ ——_ 


= (4.2 —3.9)+1.979 


= (—0.687, 1.287) 
Interpretation 
We are 95% confident that the difference in mean pain intensity 3 days after treatment for the 
vertebroplasty treatment and the fake treatment is between —0.687 and 1.287. 


c 14 days: 
Check of Conditions 
See Part (b). 
Calculation 
df= 128.774. The 95% confidence interval for 4, — 4, is 


ys 5 

pairs i. [si Ss 
(x, —X,)+(¢ critical value), /—- += 
age 


phere Pee 
+ 
8 


=(4.3-4.5)+1.979 


= (—1.186,0.786) 
Interpretation 
We are 95% confident that the difference in mean pain intensity 14 days after treatment for 
the vertebroplasty treatment and the fake treatment is between —1.186 and 0.786. 


| month: 

Check of Conditions 

See Part (b). 

Calculation 

df = 127.435. The 95% confidence interval for 4, — 44, is 


(xX, —X,)+(¢ critical value), |/— 


var 


at 
8 


=(3.9-4.6)+1.979 


= (-1.722,0.322) 
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Interpretation 
We are 95% confident that the difference in mean pain intensity 1 month after treatment for 
the vertebroplasty treatment and the fake treatment is between —1.722 and 0.322. 


d_ The fact that all of the intervals contain zero tells us that we do not have convincing evidence 


at the 0.05 level of a difference in the mean pain intensity for the vertebroplasty treatment and 
the fake treatment at any of the three times. 


11.11 1. 44 =mean daily commute for Calgary working males 


/, = mean daily commute for Calgary working females 


2. Ho: ft, — f, =0 

3. Ay: fy — Ly #0 

Dee 0.05 
sf 5 8? 8 
elias ceaee 
iy 1h, mn 


6. Weare told that the samples were random samples from the populations. Also 7, = 247 = 30 
and n, = 253 230, so we can proceed with the two-sample f test. 


24.3° i 24.0° 
Ate 53 
So dh =4197830 


P-value = 2: P(t; 33 > 1.065) = 0.288 


9. Since P-value = 0.288 >0.05 we do not reject Ho. We do not have convincing evidence that 
the mean commute times for male and female working Calgary residents differ. 


11.13. a “4 =mean payment for claims not involving errors 
4, = mean payment for claims involving errors 
Ap: fy — fb =9 
Ay: fy — fy <9 
b Answer: (ii) 2.65. Since the samples are large, we are using a ¢ distribution with a large 


number of degrees of freedom, which can be approximated with the standard normal 
distribution. P(Z > 2.65) = 0.004, which is the P-value given. None of the other possible 


values of t gives the correct P-value. 


11.15 a Check of Conditions 


Wet VM Mtb 


300 320 340 360 380 400 
Breaking Force 
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The boxplots are roughly symmetrical (in such small samples no greater degree of symmetry 
can be expected, even from perfectly normally distributed populations) and neither data set 
contains outliers, so we are justified in assuming normal distributions for the populations. 
Therefore, if we assume that the cement bonds were randomly assigned to the treatments, we 
can proceed with construction of a two-sample ¢ interval. 

Calculation 

df = 8.765. The 90% confidence interval for “4 — £4, is 


2) ) 

= ee [si 
(x, —X,)+(¢ critical value), /—- += 
Ute Ie 


1833947 . 212707 
6 


= (311.6 —355.583)+1.839 


= (-68.668, — 19.299) 


Interpretation 
We are 90% confident that the difference between the mean breaking force in a dry medium 


at 37 degrees and the mean breaking force at the same temperature in a wet medium is 
between —68.668 and —19.299. 


b 1. 4 =mean breaking strength in a dry medium at 37 degrees 
/ = mean breaking strength in a dry medium at 22 degrees 


2: Ay: ft — 1, = 100 
Saal [> 100 
4, 0290.1 
1, sesh 
Nn Ny nN, 
6. 


37 degrees 


a lr sl Da 9 er rei ete ene cet Net lle 
100 150 200 250 300 350 
Breaking Force 


The boxplots are roughly symmetrical and neither data set contains outliers, so we are 
justified in assuming normal distributions for the populations. Therefore, if we assume 
that the cement bonds were randomly assigned to the treatments, we can proceed with the 
two-sample f test. 


_ (311.6-157.517)-(100) 


13.37% ‘ 44,3077 
6 6 


Pats = 2.762 


8. df= 16-671 
P-value = P(t, 65, > 2.762) = 0.015 
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9. Since P-value = 0.015< 0.1 we reject Hy. We have convincing evidence that the mean 
breaking force in a dry medium at the higher temperature is greater than the mean 
breaking force at the lower temperature by more than 100 N. 


11.17 a 1. {4 =mean percentage of time playing with police car for male monkeys 


Hy = mean percentage of time playing with police car for female monkeys 


ea ely (Diy =a) 

3 Lee fh) = Lee 0 

4, a@=0.05 

re (x, — x, )— (hypothesized value) tx (=H %,)=0 
sr nt s? ' s3 
1, fh, Dy Te 


6. Weare told that that it is reasonable to regard these two samples of 44 monkeys as 
representative of the populations of male and female monkeys. It is therefore reasonable 
to regard them as random samples. Also n, = 44230 and n, = 44 230, so we can 
proceed with the two-sample f test. 


7 fatal Satneys 
wens 
— + —— 
\ 44 44 
Se dt= 82.047 


P-value = P(tg5 47 > 10.359) = 0 


9. Since P-value =0< 0.05 we reject Hp. We have convincing evidence that the mean 
percentage of the time spent playing with the police car is greater for male monkeys than 
for female monkeys. 


b 1. 44 =mean percentage of time playing with doll for male monkeys 


/l, = mean percentage of time playing with doll for female monkeys 


2 dige fh — Lb 0 

Bol td aretha 0) 

4, a@=0.05 

ak oa (x, — x, )— (hypothesized value) _&%-% )-0 
cae Fs 
i wads hy at, 


6. Weare told that that it is reasonable to regard these two samples of 44 monkeys as 
representative of the populations of male and female monkeys. It is therefore reasonable 
to regard them as random samples. Also n, = 44230 and n, = 44 230, so we can 


proceed with the two-sample f test. 


i ee 16816 
— + — 
44 44 

8. df= 63.235 


P-value = P(t3535 < —16.316) =0 
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9. Since P-value ~0<0.05 we reject Ho. We have convincing evidence that the mean 
percentage of the time spent playing with the doll is greater for female monkeys than for 
male monkeys. 


c 1. £4 =mean percentage of time playing with furry dog for male monkeys 
LM, = mean percentage of time playing with furry dog for female monkeys 


2. tH: fi, — 15 =90 
3. Hy: [4-15 #0 
4, @=0.05 
5 = (x, — X,)— (hypothesized value) > (= xy=—0 
. Cle 
mh 


6. Weare told that that it is reasonable to regard these two samples of 44 monkeys as 
representative of the populations of male and female monkeys. It is therefore reasonable 
to regard them as random samples. Also 1, = 44 230 and n, = 44 230, so we can 
proceed with the two-sample f test. 

25 —20 


7. t= =_=4.690 
5 5 
—_— + — 
44 44 

8. df= 86 


P-value = 2- P(tg, > 4.690) = 0 
9. Since P-value ~0<0.05 we reject Hp. We have convincing evidence that the mean 


percentage of the time spent playing with the furry dog is not the same for male monkeys 
as it is for female monkeys. 


d_ The results do seem to provide convincing evidence of a gender basis in the monkeys’ 
choices of how much time to spend playing with each toy, with the male monkeys spending 
significantly more time with the “masculine toy” than the female monkeys, and with the 
female monkeys spending significantly more time with the “feminine toy” than the male 
monkeys. However, the data also provide convincing evidence of a difference between male 
and female monkeys in the time they choose to spend playing with a “neutral toy.” It is 
possible that it was some attribute other than masculinity/femininity in the toys that was 
attracting the different genders of monkey in different ways. 


e The given mean time playing with the police car and mean time playing with the doll for 
female monkeys are sample means for the same sample of female monkeys. The two-sample f 
test can only be performed when there are two independent random samples. 


11.19 a Since the samples are small it is necessary to know, or to assume, that the distributions from 
which the random samples were taken are normal. However, in this case, since both standard 
deviations are large compared to the means, it seems unlikely that these distributions would 
have been normal. 


b Now, since the samples are large, it is appropriate to carry out the two-sample ¢ test, whatever 
the distributions from which the samples were taken. 


¢ 1. “4 =mean fumonisin level for corn meal made from partially degermed corn 


© 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. 


Chapter 11: Comparing Two Populations or Treatments 141 


H, = mean fumonisin level for corn meal made from corn that has not been degermed 


2, FA: i, -L, =90 

3. Ay: Ly — Ly #0 

4. a@=0.01 
St 2 Cat 
nN Ny ny Ny 


6. We are told that the samples were random samples from the populations. Also 
n, =50 230 and n, =50 230, so we can proceed with the two-sample f test. 
059=1.21 


7. t= = -2.207 
i AR 
ett 
50 50 

8. adi = 79.479 


P-value = 2+ P(t 479 < —2.207) = 0.030 


9. Since P-value =0.030>0.01 we do not reject Hp. We do not have convincing evidence 
that there is a difference in mean fumonisin level for the two types of corn meal. 


11.21 a 1. 4 =mean oxygen consumption for noncourting pairs 


fy = mean oxygen consumption for courting pairs 


. A: LL-H, =0 
Sh gL fb <0 
a=0.05 
ee ie a wane Ge 
ek (X= 2%, ) ES value) _ (% 2) v where s, = (n, —1)s; +(n, —1)s5 
Sp Sp Sp Sp m+n, —2 
Lily Ue: Uste 749 


6. We need to assume that the samples were random samples from the populations, and that 
the population distributions are normal. Additionally, the similar sample standard 
deviations justify our assumption that the populations have equal standard deviations. 


2 2 eed 
ae [10(0.0066) +14(0.0071)° _ ofi0600, 7 0.072 —0.099 aes 
2A 0.006907 Fi 0.006907 


11 15 
8. df=24 
P-value = P(t,, < —9.863) = 0 
9. Since P-value =~0<0.05 we reject Hy. We have convincing evidence that the mean 
oxygen consumption for courting pairs is higher than the mean oxygen consumption for 
noncourting pairs. 


b For the two-sample ¢ test, t = —9.979, df = 22.566, and P-value = 0. Thus the conclusion is 
the same. 


11.23 For each pipe, one side (left/right) could be coated with the first type of coating, and the other 
side could be coated with the other type of coating, with the sides being chosen at random for 
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each pipe. Then the two coatings are being tested under almost exactly equal conditions in terms 
of the extraneous variables mentioned. 

11.25 


nA PW NO — 


5 


— 


amd bet Sd ES 
Alan 


95 


4, =mean swimming velocity difference (water — guar syrup) 
A: L, =9 
Ay: Hz #9 
a=0.01 
_ X, —hypothesized value 


s,/Jn 


SE ——————— SE Ee SSS 2 Se ee ee bee ee eS eee ee 
-0.050 -0.025 0.000 0.025 0.050 0.075 
Difference 


The boxplot shows that the distribution of the differences is roughly symmetrical and has no 

outliers, so we are justified in assuming that the population distribution of differences is 

normal. Additionally, we need to assume that this set of differences forms a random sample 

from the set of differences for all swimmers. 

xX, =—0.004, s, =0.035 

f= Jill ila ee —0.515 
0.035//20 
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8. df=19 
P-value = 2- P(t, <—0.515) =0.612 
9. Since P-value = 0.612 >0.01 we do not reject Hy. We do not have convincing evidence of a 


difference between the mean swimming speeds in water and guar syrup. The given data are 
consistent with the authors’ conclusion. 


11.27 


Location 1 Location 1 


After 


HM, = mean difference in MPF (Location 1, Before — After) 
A: [Lan = 0 
dif; <0 
a@=0.05 
_ X, —hypothesized value 


sy/Nn 


-5 -4 -3 -2 -] 0 1 2 
Difference 


The boxplot shows that the distribution of the differences is roughly symmetrical and has 
no outliers, so we are justified in assuming that the population distribution of differences 
is normal. Additionally, we are told to assume that it is reasonable to regard the sample of 
ten men as representative of healthy adult males, and so we can treat the sample as a 
random sample from that population. 
¥, =—-1.930,5,=1.969 

=1793'= 0 


oes 
1.965//10 

di=9 

P-value = P(t, < —3.106) = 0.006 


= —3.106 
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9. Since P-value = 0.006 < 0.05 we reject Hy. We have convincing evidence that the mean 
MPF at brain location | is higher after diesel exposure. 


b Check of Conditions 


-4 Es -2 -| 0 
Difference 


The boxplot shows that the distribution of the differences (before — after at location 2) is 
roughly symmetrical and has no outliers, so we are justified in assuming that the population 
distribution of differences is normal. Additionally, we are told to assume that it is reasonable 
to regard the sample of ten men as representative of healthy adult males, and so we can treat 
the sample as a random sample from that population. 

Calculation 

df= 9. The 90% confidence interval for 4, is 


X, +(¢ critical value) Sd =]. 5441 Segeess = (—2.228,— 0.852) 


Jn J10 
Interpretation 
We are 90% confident that the difference in mean MPF at brain location 2 before and after 
exposure to diesel exhaust is between —2.228 and —0.852. 


11.29 a 1. jw“, =mean difference between profile height and actual height (profile — actual) 
2 Ho: La 0 
Sratign [LO 
4. a@=0.05 


Sa / Vn 
6. Weare told to assume that the sample is representative of male online daters, and 


therefore we are justified in treating it as a random sample. Therefore, since n = 40 > 30, 
we can proceed with the paired f test. 


1h Peg mid Ta Bee a4 
0.81//40 
8. df=39 


P-value = P(tyy) > 4.451) = 0 
9. Since P-value ~0<0.05 we reject Ho. We have convincing evidence that, on average, 
male online daters overstate their height in online dating profiles. 


b Check of Conditions 
We are told to assume that the sample is representative of female online daters, and therefore 
we are justified in treating it as a random sample. Therefore, since n = 40 > 30, we can 
proceed with the paired ¢ interval. 
Calculation 
df= 39. The 95% confidence interval for 4, is 


© 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. 


Chapter 11: Comparing Two Populations or Treatments 145 


¥, (¢ critical value) =0.03+ 2.023 = = (—0.210,0.270) 
a) 


Interpretation 
We are 95% confident that the difference between the mean online dating profile height and 
mean actual height for female online daters is between —0.210 and 0.270. 


¢ 1. £4, =mean height difference (profile — actual) for male online daters 


4, = mean height difference (profile — actual) for female online daters 


2. Ho: fy — ML, =0 
3. Ay: Ly, — My > 0 
ae to )05 

ss Tae 
neil i ei 


6. Weare told to assume that the samples were representative of the populations, and 
therefore we are justified in assuming that they are random samples. Also n,, = 402 30 


and n, = 402 30, so we can proceed with the two-sample f test. 


3 fest TOY 953.094 
OS EOFS: 
+ 
40 40 
gs i= 971543 


P-value = P(t,, 543 > 3.094) = 0.001 
9. Since P-value =0.001<0.05 we reject Hp. We have convincing evidence that 
Hn —Hy > 0 : 


d_ In Part (a), the male profile heights and the male actual heights are paired (according to which 
individual has the actual height and the height stated in the profile), and with paired samples 
we use the paired ¢ test. In Part (c) we were dealing with two independent samples (the 
sample of males and the sample of females), and therefore the two-sample ¢ test was 
appropriate. 


11.31 
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1. {4, = mean bone mineral content difference (breast feeding — postweaning) 
2 Ab: Ly =-25 
Sy A: Ma < —25 
4. a@=0.05 
x, — hypothesized value 
5. t= 
Sq /Nn 
6. 
Y Y 
-350 -300 -250 -200 -150 -100 -50 0 
Difference 
The boxplot shows that the distribution of the sample differences is negatively skewed, but 
for a relatively small sample this distribution is not inconsistent with a population that is 
normally distributed. Additionally, the sample distribution of differences has no outliers. We 
need to assume that the mothers used in the study formed a random sample from the 
population of mothers. 
7. X,=—-105.7, s, =103.845 
Beli ical-25) = 457 
103.845/V/10 | 
8. dfi=9 
P-value = P(t, < —2.457) =0.018 
9. Since P-value =0.018< 0.05 we reject Hy. We have convincing evidence that the average 
total body bone mineral content during postweaning is greater than that during breast feeding 
by more than 25 grams. 
11.33 a 1. “£4, =mean difference in wrist extension (type A — type B) 
2. fH: LH, =9 
ate Hs: Lae >) 
4. a@=0.05 
5 a Xe hypothesized value 


Sq / Vn 


6. Weare told to assume that the sample is representative of the population of computer 
users, and therefore we are justified in treating it as a random sample from that 
population. However, in order to proceed with the paired ¢ test we need to assume that the 
population of differences is normally distributed. 


fe pee Sete) 35) 
10//24 
Readies 


P-value = P(t,, > 4.321) =0 
9. Since P-value ~0<0.05 we reject Hp. We have convincing evidence that the mean wrist 
extension for mouse type A is greater than for mouse type B. 
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8.82 —0 
b Now t= = 1.662, andso P-value = P(t,, >1. as 
26/V 24 value = P(t,, > 1.662) = 0.055. Since 


_ P-value = 0.055 > 0.05 we do not reject Hy. We do not have convincing evidence that the 
mean wrist extension for mouse type A is greater than for mouse type B. 


e A lower standard deviation in the sample of differences means that we have a lower estimate 
of the standard deviation of the population of differences. Assuming that the mean wrist 
extensions for the two mouse types are the same (in other words, that the mean of the 
population of differences is zero), a sample mean difference of as much as 8.82 is much less 
likely when the standard deviation of the population of differences is around 10 than when 
the standard deviation of the population of differences is around 26. 


11.35 4, = mean difference between verbal ability score at age 8 and verbal ability score at age 3 
(age 8 — age 3) 
FA: a 0 
Hy; Uz >9 
O=005 
We are told to assume that the sample is a random sample from the population of children born 
prematurely. Therefore, since n = 50 230, we can proceed with the paired f test. 
P-value = 0.001 
Since P-value = 0.001< 0.05 we reject Hp. We have convincing evidence that the mean verbal 
ability score for children born prematurely increases between age 3 and age 8. 


11.37. 1. p, =proportion of guests who reserve by phone who are satisfied 


P> = proportion of guests who reserve online who are satisfied 


2. ig: Pp, — py =0 

B dalh pie wey eeil 

4. a@=0.05 

5. z= Pi-P» 


b= b.) , P= P.) 
ny Ny 


6. Weare told that the samples were independent random samples from the populations. Also 
n, P, = 80(57/80) = 57 210, n,(1— p,) =80(23/80) = 23210, np, = 60(50/60) = 50 2 10, 
and n,(1— p,)=60(10/60)=10210, so the samples are large enough. 
eo LU RALOY, 
Pe“ 30+60 140 
57/80 — 50/60 


ee 


we [(107/140)(33/140) , (107/140)(33/140) : 
80 60 


8. P-value = P(Z <—1.667) = 0.048 
Since P-value = 0.048 < 0.05 we reject Hy. We have convincing evidence that the proportion 
who are satisfied is higher for those who reserve online than for those who reserve by phone. 


11.39 a 1. p, =proportion of Gen Y respondents who donated by text message 
P> = proportion of Gen X respondents who donated by text message 
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2. Ho: p,-—p, =9 

By pj} ps > 0 

4. a=0.01 

5 z= Pi — Py 


PAl-P.), P= P.) 
n ny 
6. Weare told to regard the samples as representative of the Gen Y and Gen X populations, 
so it is reasonable to treat them as independent random samples from the populations. 
Also n,p, = 400(0.17) = 68210, n,(1— p,) = 400(0.83) = 332 2 10, 
nN P, = 400(0.14) = 56 >10, and n,(1— p,) = 400(0.86) = 344 210, so the samples are 
large enough. 

. _Mmp,+nyp, _ 400(0.17)+400(0.14) 


<= = 0,155 
‘ n +n, 400 + 400 


0.17-0.14 


res TD 
[(0.155)(0.845) ,, (0.155)(0.845) 
400 400 


8. P-value =P(Z.> 1,172) =0,121 ‘ 
Since P-value = 0.121>0.01 we do not reject Hp. We do not have convincing evidence 
that the proportion of those in Gen Y who donated to Haiti relief via text message is 
greater than the proportion of those in Gen X. 


bie 
“< 


b Check of Conditions 
See Part (a). 
Calculation 
The 99% confidence interval for p, — p, is 


Pp (— p,) cs D,G—p,) 
n ny 


(p, — Pp) +(< critical value) 


(0-17 =0.14)2 2.576, ee a) 
400 400 


= (—0.036,0.096) 
Interpretation of Interval 
We are 99% confident that the difference between the proportion of Gen Y and the proportion 
of Gen X who made a donation via text message is between —0.036 and 0.096. 


Interpretation of Confidence Level 
In repeated sampling with random samples of size 400, 99% of the resulting confidence 
intervals would contain the true difference in proportions who donated via text message. 


11.41 a 1. p, =proportion of American teenage girls who say that newspapers are boring 


P, = proportion of American teenage boys who say that newspapers are boring 


2. Ho: pj - Pp, = 
3.) Ay: py— p, #0 
4. a@=0.05 
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ee 
icc Di stare -aeaet BC 

Pp Da) aN jane! =p) 
nN Ny 


6. The samples were representative of the populations, so it is reasonable to treat them as 
independent random samples from the populations. Also n,p, = 58(0.41) = 24 > 10, 


n,(1— p,) =58(0.59) = 34210, n,p, =41(0.44)=18>10, and 
n,(1— p,)=41(0.56) = 23 >10, so the samples are large enough. 
Phe Ny Py +N Py _ 58(0.41)+41(0.44) 
+n, 58+ 4] 
0.41—0.44 


sre Sea re a ego i see Bet SEE et ()5.08 
[(0.422)(0.578) ,, (0.422)(0.578) 
58 41 


8. P-value=2-P(Z <—0.298) = 0.766 
Since P-value = 0.766 > 0.05 we do not reject Hp. We do not have convincing evidence 
that the proportion of girls who say that newspapers are boring is different from the 
proportion of boys who say that newspapers are boring. 


= 0.422 


b_ Since the samples are larger than in Part (a), the conditions for performing the test are also 
satisfied here. The calculations will change to the following: 
ae 1, P, + MP» _ 2000(0.41) +2500(0.44) _ 0.427 
ny +n, 2000 + 2500 


0.41—0.44 


ee eee) 02 
[(0.427)(0.573) _ (0.427)(0.573) 
2000 2500 


P-value = 2+ P(Z < —2.022) = 0.043 

Since P-value = 0.043< 0.05 we reject Ho. We have convincing evidence that the proportion 
of girls who say that newspapers are boring is different from the proportion of boys who say 
that newspapers are boring. 


vA 
~ 


c Assuming that the population proportions are equal, you are much less likely to get a 
difference in sample proportions as large as the one given when the samples are very large 
than when the samples are relatively small, since large samples are likely to give more 
accurate estimates of the population proportions. Therefore, when the given difference in 
sample proportions was based on larger samples, this produced stronger evidence of a 
difference in population proportions. 


11.43 a Check of Conditions 
We are told to regard the samples as representative of teens before and after the ban, so it is 


reasonable to treat them as independent random samples from these populations. Also 
n, P, = 200(0.11) = 22210, m,(1- p,) = 200(0.89) =1782 10, n,p, =150(0.12) =18 2 10, 
and n,(1— p,) =150(0.88) =132 210, so the samples are large enough. 


Calculation 
The 95% confidence interval for p, — p, is 
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——__— 
(f, <P) £(z eritical vale) | 21 cue alana 


2) 
ET DNBPOT RENE Al Tog ihe oe eneea dy UE) 
200 150 


= (—0.078, 0.058) 
Interpretation 
We are 95% confident that the difference between the proportion of teenagers using a cell 
phone before the ban and the proportion of teenagers using a cell phone after the ban is 
between —0.078 and 0.058. 


b Zero is included in the confidence interval. This tell us that we do not have convincing 
evidence at a 0.05 significance level of a difference between the proportion of teenagers using 
a cell phone before the ban and the proportion of teenagers using a cell phone after the ban. 


11.45 No. It is not appropriate to use the two-sample z test because the groups are not large enough. We 
are not told the sizes of the groups, but we know that each is at most 81. The sample proportion 
for the fish oil group is 0.05, and 81(0.05) = 4.05, which is less than 10. So the conditions for the 
two-sample z test are not satisfied. 


11.47 1.  p, =proportion of passengers on airplanes that do not recirculate air who have post-flight 
respiratory symptoms _ 
P> = proportion of passengers on airplanes that recirculate air who have post-flight 
respiratory symptoms 


2. Ho: pj — p, =9 

3. H,: p,—p, #0 

4. a@=0.05 

5. z= Pi =P» 


Pp. Ded) ms pd =P) 
nN Ny 


6. Weare told to assume that it is reasonable to regard the two samples as being independently 
selected and as representative of the two populations. Therefore it is reasonable to treat the 
samples as independent random samples from the populations. Also, 

n, Pp, = 517(108/517) =108 210, n,(1- p,) =517(409/517) = 409 = 10, 

Ny P, = 583(111/583)=111210, and n,(1— p,) =583(472/583) = 472 = 10, so the samples 
are large enough. 

peoreLOSal ety old 

Pe 517+583 1100 
108/517 -111/583 


((219/1 100)(88 1/1100) 7 (219/1100)(88 1/1100) i 
pial) 583 


8. P-value=2-P(Z > 0.767) = 0.443 


9. Since P-value = 0.443 > 0.05 we do not reject Ho. We do not have convincing evidence that 


the proportion of passengers with post-flight respiratory symptoms differs for planes that do 
and do not recirculate air. 


0.767 
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11.49 Check of Conditions 
We are told that the samples were random samples from the populations of Americans age 12 and 
over. Also, mp, =1112(0.2)=222210, n,(1 — p,) =1112(0.8) = 890 = 10, 


fig, —1112(0:15)=167 210) and n(1— p,)=1112(0.85) = 945 > 10, so the samples are large 
enough. 

Calculation 

The 95% confidence interval for p,— p, is 


(P; — P,)+(z critical value) BOE PIs Des) 


Bons) et Oe ea OS OBS) 
i 2 1112 
= (0.018,0.082) 


Interpretation 

We are 95% confident that the proportion of Americans age 12 and over who owned an MP3 
player in 2006 minus the proportion of Americans age 12 and over who owned an MP3 player in 
2005 is between 0.018 and 0.082. 


Zero is not included in the confidence interval. This means that we have convincing evidence at 
the 0.05 significance level of a difference between the proportions of people owning MP3 players 
in 2006 and 2005. 


11.51. 1. p, =proportion of parents who think that science and higher math are essential 
P> = proportion of students in grades 6—12 who think that science and higher math are 


essential 
22 11g Pe PU 
3, Heep py FU 
4. a@=0.05 
5. z=—= Py Py = 
P(- P.) , Pol P.) 
ny Ny 


6. Weare told that the samples were independently selected, but we need to assume that they 
were independent random samples from the populations. Also 1, p, =1379(0.62) = 855 2 10, 
n,(1— p,) =1379(0.38) =524 210, np, =1342(0.5)= 671210, and 
n,(1— p,)=1342(0.5) = 671210, so the samples are large enough. 


ae nN, P, +N, P> ‘s 1379(0.62) + 1342(0.5) - 0.561 
m n, +N, 13791342 


0.62 -0.5 7 


eee 0.300 
(0.561)(0.439) , (0.561)(0.439) 
1379 1342 


8. P-value =2-P(Z > 6.306) = 0 

9. Since P-value ~0<0.05 we reject Hy. We have convincing evidence that the proportion of 
parents who regard science and mathematics as crucial is different from the corresponding 
proportion of students in grades 6-12. 
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11.53 1. p, =proportion of college graduates who have sunburn 
P> = proportion of people without a high school degree who have sunburn 


2. Ho: Pp, — P= 

3. Ay: Pp, — p>, >9 

4. a@=0.05 

5 z= PyePly 


b= P) , PAl- be) 
n Ny 
6. Weare told to assume that the samples were random samples from the populations. Also 
n, P, = 200(0.43) = 86 = 10, n,(1- p,) = 200(0.57) =114 210, np, =200(0.25) = 50 2 10, 
and n,(1— p,) =200(0.75) =150210, so the samples are large enough. 
~ MP, +n p, _ 200(0.43) + 200(0.25) 
oe een 200 +200 
0.43 -—0.25 


————————————————— = 3,800 
[(0.34)(0.66) i (0.34)(0.66) 
200 200 


8. P-value = P(Z > 3.800) = 0 
Since P-value ~0< 0.05 we reject Ho. We have convincing evidence that the proportion 
experiencing sunburn is greater for college graduates than it is for those without a high school 
degree. 


= 0.34 


11.55 a 1. p, =proportion of Austrian avid mountain bikers who have a low sperm count 


P> = proportion of Austrian nonbikers who have a low sperm count 


2. Ho: p, — p, =9 
3, Hy P, — po >) 
4. a@=0.05 
a Py =D 
PeO- Pe) , Pe Pe) 
nN Ny 


6. Weare told to assume that the percentages were based on independent samples and that 
the samples were representative of Austrian avid mountain bikers and nonbikers. So it is 
reasonable to assume that the samples were independent random samples. Also 
n, p, = 100(0.9) =90 210, n,(1— p,) =100(0.1)=10210, n,p, =100(0.26) = 26 > 10, 
and n,(1— p,)=100(0.74) = 74 = 10, so the samples are large enough. 


5, =2iPi+mpby _ 100(0.9)+100(0.26) _ 9 55 
n, +n, 100 +100 


0.9-0.26 


a = 9,169 
[(0.58)(0.42) n (0.58)(0.42) 
100 100 


8. P-value= P(Z >9.169) =0 


9. Since P-value ~0<0.05 we reject Hp. We have convincing evidence that the proportion 
of Austrian avid mountain bikers with a low sperm count is higher than the equivalent 
proportion for Austrian nonbikers. 
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b_ No. Since this is an observational study, causation cannot be inferred from the result. It could 


be suggested that, for example, Austrian men who have low sperm counts have a tendency to 
choose mountain biking as a hobby. 


11.57 Since the data given are population characteristics an inference procedure is not applicable. It is 


known that the rate of Lou Gehrig’s disease amongst soldiers sent to the war is higher than for 
those not sent to the war. 


11.59 a_ First hypothesis test 
P, = proportion of those receiving the intervention whose references to sex decrease to zero 


P» = proportion of those not receiving the intervention whose references to sex decrease to 


Zero 
Ay: Pp; — Pp, =9 
f,: p, — p, #9 


(Note: We know that the researchers were using two-sided alternative hypotheses, otherwise 
the P-value greater than 0.5 in the second hypothesis test would not have been possible for 
the given results.) 


Since P-value = 0.05, Ho is rejected at the 0.05 level. 


Second hypothesis test 
P, = proportion of those receiving the intervention whose references to substance abuse 


decrease to zero 

P> = proportion of those not receiving the intervention whose references to substance abuse 
decrease to zero 

Ho: P, — P, = 9 

Te 

Since P-value = 0.61, Ho is not rejected at the 0.05 level. 


Third hypothesis test 
P, = proportion of those receiving the intervention whose profiles are set to “private” at 


follow-up 
P> = proportion of those not receiving the intervention whose profiles are set to “private” at 


follow-up 

Ho: P, — Pp, =0 

Hi, P,— Pp, £0 

Since P-value = 0.45, Hp is not rejected at the 0.05 level. 


Fourth hypothesis test 
P; = proportion of those receiving the intervention whose profiles show any of the three 


protective changes 
p> = proportion of those not receiving the intervention whose profiles show any of the three 


protective changes 

Ay: P,- P, =9 

Ai PP, — Pp, #9 

Since P-value = 0.07, Hp is not rejected at the 0.05 level. 


© 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. 


154 Chapter 11: Comparing Two Populations or Treatments 
A ea eee 


b If we want to know whether the email intervention reduces (as opposed to changes) 
adolescents’ display of risk behavior in their profiles, then we use one-sided alternative 
hypotheses and the P-values are halved. If that is the case, using a 0.05 significance level, we 
are convinced that the intervention is effective with regard to reduction of references to sex 
and that the proportion showing any of the three protective changes is greater for those 
receiving the email intervention. Each of the other two apparently reduced proportions could 
have occurred by chance. 


11.61 a 1. 4 =mean appropriateness score assigned to wearing a hat in class for students 
4, = mean appropriateness score assigned to wearing a hat in class for faculty 


2: Ao: 4 — fb =9 

3. A: Ll, — Lh #0 

Avro = 005 

nf EE EU SM 
ae ae 
Sages ee ee 
The tb nm i 


6. Weare told that the samples were random samples from the populations. Also 
n, =173 230 and n, = 98 2 30, so we can proceed with the two-sample ¢ test. 


2) fae 
foot 7803.83 = 6.565 
KOteels07 
——— + —— 
173 98 
8. df=201.549 


P-value = 2+ P( ty) 549 < —6.565) = 0 
9. Since P-value =0<0.05 we reject Hp. We have convincing evidence that the mean 
appropriateness score assigned to wearing a hat in class differs for students and faculty. 


b 1. 4 =mean appropriateness score assigned to addressing an instructor by his/her first 
name for students 
{4, = mean appropriateness score assigned to addressing an instructor by his/her first 
name for faculty 


2. HM: {4-/5=0 

3) (Hy [—/L > 0 

4, = 0.05 

Be pitpen waa terial & (YOU eSIZ eC VANNG ig Utara Y 


6. Weare told that the samples were random samples from the populations. Also 
n, =1732 30 and n, = 98 2 30, so we can proceed with the two-sample ¢ test. 


290 = 2,14 


7. t=7 = 6,249 
173 98 
8. df=201.549 


P-value = P(t59) 549 > 6.249) = 0 
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9. Since P-value =0<0.05 we reject Hy. We have convincing evidence that the mean 


appropriateness score assigned to addressing an instructor by his/her first name is greater 
for students than for faculty. 


¢ 1. (4 =mean appropriateness score assigned to talking on a cell phone in class for students 


H, = mean appropriateness score assigned to talking on a cell phone in class for faculty 


De ie aii) 

a eae ee, £0 

4. @=0.05 

ci (x, — Xx, ) — (hypothesized value) < (% —x,)-0 
StS St 
Ee ths ler! 


6. Weare told that the samples were random samples from the populations. Also 
n, =173 230 and n, =98 230, so we can proceed with the two-sample ¢ test. 
1.11-—1.10 


ps {== = 0.079 
LO" 1.07 
7398 

8. df= 201.549 


P-value = 2- P(ty9 549 > 0.079) = 0.937 


9. Since P-value = 0.469 > 0.05 we do not reject Hp. We do not have convincing evidence 


that the mean appropriateness score assigned to talking on a cell phone in class differs for 
students and faculty. 


No, this does not imply that students and faculty consider it acceptable to talk on a cell phone 


during class, in fact the low sample mean ratings for both students and faculty show that both 
groups on the whole feel that the behavior is inappropriate. 
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11.63 


Difference Difference 
(Init — 9 Day, (Init — 9 Day, 
for Trtmnt Grp) | for Cntrl Grp 


eam 3 9 SAE 


103.8 


8 
10.4 


[Ogee eoo ee 


a 1. ju, =mean difference in selenium level (initial level — 9-day level) for cows receiving 


supplement 
2. Ho: 4, =0 
Bi igetts <0 
4. a@=0.05 
x, — hypothesized value 
5. f= 
Sq [Vn 
6. 
Y Y 
WM 
-160 -150 -140 -130 -120 -110 -100 -90 -80  -70 
Difference 
The boxplot shows a distribution of sample differences that is negatively skewed, but in a 
small sample (along with the fact that there are no outliers) this is nonetheless consistent 
with an assumption of normality in the population of differences. Additionally, we need 
to assume that the cows who received the supplement form a random sample from the set 
of all cows. 
7. X, =—104.731, s, =24.101 
t= rot Wei eels = Lone 
24.101/V16 
8. df=15 


P-value = P(t,, < -17.382) =0 
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9. Since P-value =~0< 0.05 we reject Hy. We have convincing evidence that the mean 
selenium concentration is greater after 9 days of the selenium supplement. 


b 1. 4, =mean difference in selenium level (initial level — 9-day level) for cows not 
receiving supplement 


Ds A: fa = 0 
Seidel 0) 
4. a@=0.05 
an oe X, — hypothesized value 
. 
Sa [Jn 
6. 


-1 0 1 2 3 
Difference 


Since the boxplot is roughly symmetrical and there are no outliers we are justified in 
assuming a normal distribution for the population of differences. Additionally, we need to 
assume that the cows who did not receive the supplement form a random sample from the 
set of all cows. 

7. Xz =0.693, s, =1.062 


| A ee 
1.062//14 
8. df=13 


P-value = 2- P(t, > 2.440) = 0.030 
9. Since P-value = 0.030 < 0.05 we reject Ho. At the 0.05 level the results are inconsistent 


with the hypothesis of no significant change in mean selenium concentration over the 9- 
day period for cows that did not receive the supplement. 


c No, the paired ¢ test would not be appropriate since the treatment and control groups were not 
paired samples. 


11.65 1. p, =proportion of resumes with “white-sounding” names that receive responses 


P> = proportion of resumes with “black-sounding” names that receive responses 


2. Ho: p,-— Pp, =9 
Seine pp 

4. @=0.05 

a pb, 


PAl-P.), Pl P.) 
hy a) 

6. We need to assume that the 5000 jobs applied for were randomly assigned to the names used. 
Also, 1, p, = 2500(250/2500) = 250210, m,(1— p,) = 2500(2250/2500) = 2250 2 10, 
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Ny P» = 2500(167/2500) = 167 210, and n,(1— p,) = 2500(2333/2500) = 2333 210, so the 
samples are large enough. 
> Ree 250+167 _ 417° 
* 2500+2500 5000 
250/2500-—167/2500 


= = 4,745 
/(417/5000)(4583/5000) FE (417/5000)(4583/5000) 
2500 2500 


8. P-value = P(Z > 4.245) =0 
Since P-value ~0< 0.05 we reject Hy. We have convincing evidence that the proportion 
eliciting responses is higher for “white-sounding” first names. 


11.67 a 1. “4 =mean elongation for a square knot for Maxon thread 
44, = mean elongation for a Duncan loop for Maxon thread 


2. Hp: [4-15 =9 

Sch FIA VIP CA 

4. a=0.05 

sige (x, — x,)— (hypothesized value) a, Cex =U 
St St 
ny UD) ny Ny 


6. Weare told that the types of knot and suture material were randomly assigned to the 
specimens. We are also told to assume that the relevant elongation distributions are 
approximately normal. 

10-11 


The ae Sell 
G03" 
— + ——_ 
10. 15 

8. df= 18.266 


P-value = 2: P(tig 66 <—11.952) =0 
9. Since P-value=~0<0.05 we reject Hy. We have convincing evidence that the mean 
elongations for the square knot and the Duncan loop for Maxon thread are different. 


b 1. “4 =mean elongation for a square knot for Ticron thread 


4, = mean elongation for a Duncan loop for Ticron thread 


2h FA: fy -b= 0 

3. Aa: ft, - 1, #9 

4, a=0.05 

op es (x, — X,)—(hypothesized value) - (x, —x,)-0 
8 
—+—4 
m Nn, 


6. Weare told that the types of knot and suture material were randomly assigned to the 
specimens. We are also told to assume that the relevant elongation distributions are 
approximately normal. 
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11.69 


nA FW N 


—_— 


wR we 


O50: 
peje ea ee ewe 
0.06= 0.47 
++ ——___ 
10 i 
df = 10.494 


P-value = 2- P(t 44 < —68.803) = 0 
Since P-value ~0<0.05 we reject Hy. We have convincing evidence that the mean 
elongations for the square knot and the Duncan loop for Ticron thread are different. 


44, = mean elongation for a Duncan loop for Maxon thread 


4, = mean elongation for a Duncan loop for Ticron thread 


Ao: fl, — Hy = 0 

Ay: [ty — Hy #0 

a@=0.05 

;- i= %)— (hypothesized value) _ (% —%))—0 
St 82 Sant 


We are told that the types of knot and suture material were randomly assigned to the 
specimens. We are also told to assume that the relevant elongation distributions are 
approximately normal. 


eee = 0.698 
0.375040 
———— + 22 
1s 11 
ab TAS) 
P-value = 2 - P(t, 799 > 0.698) = 0.494 
Since P-value = 0.697 > 0.05 we do not reject Hp. We do not have convincing evidence 


that the mean elongations for the Duncan loop for Maxon thread and Ticron thread are 
different. 


Subject [Seas 53 9 4NeS 161 ei |e3t 
24 hr later | 10 14 


Difterense [484 [ip 2 [3] 4] 3 3 


1, = mean difference in number of objects remembered (1 hr — 24 hours) 
A: Le 3 

Jolt 8) 

a=0.01 


_ xX, — hypothesized value 


sq/n 
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Difference 


The boxplot is roughly symmetrical but there is one outlier. Nonetheless we will assume that 
the population distribution of differences is normal. We are told that the eight students were 
selected at random from the large psychology class. 
7. Xz =3.625, s, =2.066 
1 33025=>3 


2.066/V/8 
8. df=7 
P-value = P(t, > 0.856) = 0.210 
9. Since P-value =0.210>0.01 we do not reject Hp. We do not have convincing evidence that 
the mean number of words recalled after | hour exceeds the mean number recalled after 24 
hours by more than 3. 


= 0.856 


11.71 


Specimen Difference 


Direct | Stratified 


nN 


= 
-32 
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11.73 


es Gee 


Ww Rw 


H4 = mean difference in number of seeds detected (direct — stratified) 
A: f= 0 
Hi; [1 #0 


 @=0.05 


X, — hypothesized value 


sa [Nn 


f= 


-40 -30 -20 -10 0 10 20 
Difference 


The boxplot shows a distribution of differences that is negatively skewed and has three 
outliers, and so the assumption that the population distribution of differences is normal is 
dubious. Nonetheless we will proceed with caution. Additionally, we need to assume that this 
set of 27 soil samples forms a random sample from the population of soil samples. 

Ky = 3A 7, Sy 13.253 


cS set OT a Oe =~—],336 
137253427 
df = 26 


P-value = 2+ P(t,, <—1.336) =0.193 
Since P-value = 0.193 > 0.05 we do not reject Hp. We do not have convincing evidence that 
the mean number of seeds detected differs for the two methods. 


P, = proportion of high school seniors exposed to the drug program who use marijuana 
P> = proportion of high school seniors not exposed to the drug program who use marijuana 
fo: p,— p, =9 


i, pe p= 0 
@= 005 
Ae Pir Pp? 


P.A-b.) , P.A-B.) 
ny Ny 


We are told that the samples were random samples from the populations. Also 
n, P, = 288(141/288) =141210, m,(1— p,) = 288(147/288) = 147 2 10, 
Ny Py = 335(181/335) = 181210, and n,(1— p,) =335(154/335) = 154 210, so the samples 
are large enough. 
oe eS le 22 


Pe 5984335. 623 
141/288 —181/335 


=e (322/623)(301/623) , (322/623)(301/623) ¥ 
288 335 


P-value = P(Z <—1.263) = 0.103 
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9. Since P-value =0.103> 0.05 we do not reject Ho. We do not have convincing evidence that 
the proportion using marijuana is lower for students exposed to the DARE program. 


11.75 Check of Conditions 
We are told that the samples were random samples from the two communities. Also, 


np, =119(67/119) =67 210, ,(1— p,) =119(52/119) = 52 > 10, 
Ny P, = 143(106/143) = 106 >10, and n,(1— p,)=143(37/143) = 37 210, so the samples are large 


enough. 
Calculation 
The 90% confidence interval for p, — p, is 


P-P) , PB) 


nN, Ny 


-( 108) 1.64 [(67/119)(52/119) _ (106/143)(37/143) 
119 143 119 143 


= (-0.274, —0.082) 


Interpretation of Interval 
We are 90% confident that p, — p, lies between —0.274 and —0.082, where p, is the proportion 


of children in the community with fluoridated water who have decayed teeth and p, is the 
proportion of children in the community without fluoridated water who have decayed teeth. 


(Pp, — P») £(< critical value) 


The interval does not contain zero, which means that we have evidence at the 0.1 level of a 
difference between the proportions of children with decayed teeth in the two communities, and 
evidence at the 0.05 level that the proportion of children with decayed teeth is smaller in the 
community with fluoridated water. 


11.77 a Check of Conditions 
We are told to assume that the peak loudness distributions are approximately normal, and that 
the participants were randomly assigned to the conditions. 
Calculation 
df = 17.276. The 95% confidence interval for 44, — 44, is 
ae a See 
(xX, —X,)+(¢ critical value), /— +— 
Ni costhy 


= (63454) POM OTe 
TOs ti 


= (—4.738, 22.738) 


Interpretation 
We are 95% confident that the difference in mean loudness for open mouthed and closed 


mouthed eating of potato chips is between —4.738 and 22.738. 


b 1. “4 =mean loudness for potato chips (closed-mouth chewing) 
44, = mean loudness for tortilla chips (closed-mouth chewing) 
2. AM: ,- fb =0 
3. Hy: {4 —f, #0 
a=0.01 
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11.79 a 


— 


ee tS 


SAR) ae” Ciggeet Se 


aes (x, — X,)— (hypothesized value) 1 (%)—x,)-0 
rae Yost 2a 
st Sy sp 8 


We are told to assume that the peak loudness distributions are approximately normal, and 
that the participants were randomly assigned to the conditions. 


5A 
pL Sait) 
16 16° 
—— + —— 
oe 266 
di 18 


P-value = 2: P(t,, > 0.140) = 0.890 


Since P-value = 0.890 > 0.01 we do not reject Hy. We do not have convincing evidence 
of a difference between potato chips and tortilla chips with respect to mean peak loudness 
(closed-mouth chewing). 


44, = mean loudness for stale tortilla chips (closed-mouth chewing) 
HM, = mean loudness for fresh tortilla chips (closed-mouth chewing) 


Ao: Ly — , =9 


Ay: fy — fy <0 
a=0.05 
_ (% —*,)—(hypothesized value) (x, -x,)—0 
es Pia 
Aeon, ian, 


We are told to assume that the peak loudness distributions are approximately normal, and 
that the participants were randomly assigned to the conditions. 


ets OF 6 ua6 
167 © 14* 
TORS sO 
df = 17.688 
P-value = P(t,7 6g < —0.446) = 0.330 
Since P-value = 0.330 > 0.05 we do not reject Ho. We do not have convincing evidence 
that fresh tortilla chips are louder than stale tortilla chips. 


1, = mean difference in systolic blood pressure between dental setting and medical 
setting (dental — medical) 
A: Hy =9 
jab (ie eal) 
a=0.01 
‘5 x, — hypothesized value 
c Sd / vn 
We need to assume that the subjects formed a random sample of patients. With this 
assumption, since n =60 230, we can proceed with the paired f test. 


t= AAT OFA og 


~ 8.77/J60 — 
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8: -dfi=59 
P-value = P(t;. > 3.948) = 0 

9. Since P-value ~0<0.01 we reject Hy. We have strong evidence that the mean blood 
pressure is higher in a dental setting than in a medical setting. 


b 1. gw, =mean difference in pulse rate between dental setting and medical setting (dental — 


medical) 
2. Ho: MH, =9 
oreriie. Ly 0 
4. a@=0.05 
ner x, — hypothesized value 


Sd / Vn 
6. Weneed to assume that the subjects formed a random sample of patients. With this 
assumption, since 7 = 60 > 30, we can proceed with the paired ¢ test. 


aa pe isa Snes 
8.84/./60 
Suedi—oo 


P-value = 2+ P(t,. <—1.165) = 0.249 

9. Since P-value =0.249 > 0.05 we do not reject Ho. We do not have convincing evidence 
that the mean pulse rate in a dental setting is different from the mean pulse rate in a 
medical setting. 


11.81. 1. p, =proportion of adults who were born deaf who remove the implant 


P> = proportion of adults who became deaf after learning to speak who remove the implant 


2. Lo: p, = p> =0 
3. Hyp, — p, #9 
4. a=0.01 
§.0 2 eee Pi7 P2 - 
PA(l~ Pe) , Pell= Pe) 
ny Ny 


6. We need to assume that the samples were independent random samples from the populations. 
Also, np, =250(75/250) = 75210, n,(1— p,) = 250(175/250) =175 = 10, 
Ny Py = 250(25/250) = 25210, and n,(1— p,) = 250(225/250) = 225 210, so the samples are 
large enough. 
i ype rs 
250 +250 
75/250 — 25/250 = 5 590 


[(0.2)(0.8) , (0.2)(0.8) 
250 250 
8. P-value=2-P(Z >5.590) =0 
9. Since P-value ~0<0.01 we reject Hy. We have convincing evidence that the proportion of 


adults who were born deaf who remove the implant is different from the proportion of adults 
who became deaf after learning to speak who remove the implant. 
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Chapter 12 
The Analysis of Categorical Data and Goodness-of-Fit Tests 


Note: In this chapter, numerical answers were found using values from a calculator. Students using 
Statistical tables will find that their answers differ slightly from those given. 


12.1 


12.3 


12.5 


a 


P-value = PUG > 7.5)=0.024. Hp is not rejected. 
P-value = PO; > 13.0) =0.043. Ab is not rejected. 
P-value = POR > 18.0)=0.035. Hp is not rejected. 
P-value = P(y; > 21.3) = 0.0002. Hy is rejected. 


P-value = AGA: >5.0)=0.172. Ah is not rejected. 


The expected counts are 80, 60 40, and 20, which are all greater than or equal to 5, so the chi- 
square test can be used. P-value = P(y; > 19.0) = 0.0002 < 0.001, so Hp is rejected. We have 


convincing evidence that the proportions of the four types of nut are not as they are supposed 
to be. 


The smallest expected count would be 40(0.1) = 4, which is less than 5. So the chi-square test 
would not be appropriate. 


Ethnici African-American Caucasian | Hispanic 
i 


Observed Count 
71.508 12.928 | 296.536 


Let p;, P>. P3, and p, be the proportions of appearances of the four ethnicities across all 


23.028 


commercials. 
Ho: p, = 9.177, p, = 9.032, p, = 0.734, p, = 0.057 


H,: Ho is not true 


a=0.01 
NC 3 (observed cell count — expected cell count)’ 
; all'eells expected cell count 


We need to assume that the set of commercials included in the study form a random sample 
from the population of commercials. All the expected counts are greater than 5. 
2 _ (57-71.508)” (6 23.028)" 
T1508 23.028 


= 19.599 


df= 3 

P-value = P(y; > 19.599) = 0 

Since P-value ~ 0< 0.01 we reject Hp. We have convincing evidence that the proportions of 
appearances in commercials are not the same as the census proportions. 


165 
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12.7 


12.9 


RW NN 


Chapter 12: The Analysis of Categorical Data and Goodness-of-Fit Tests 


Tar Level | Observed 


Let p,, P>, P;, and p, be the proportions of all male smoker lung cancer deaths for smokers 


of cigarettes of the given tar levels. 

Fy: p,; =9.25, p, =0.25, p, =0.25, p, = 0.25 
Hi: Hp is not true 

a@=0.05 


is (observed cell count — expected cell count)” 


all cells 
We are told to regard the sample as representative of male smokers who die of lung cancer, 
so it is reasonable to treat the sample as a random sample from that population. All the 
expected counts are greater than 5. 


> _ (103—298.5) Nee (150 —298.5) 
298.5 298.5 


expected cell count 


= 457.464 


df=3 

P-value = P(y; > 457.464) = 0 

Since P-value = 0< 0.01 we reject Hy. We have convincing evidence that the proportion of 
male smoker lung cancer deaths is not the same for the four given tar level categories. 


exo aanntoWracin |= Ons ee) ONS dom 
fpeNGont 3yp.miel|. S080 see, 
lecbinans tO /p.mings) Ge el OGe be eeeee 89 oheem 


1. Let p,,...,pg be the proportions of fatal bicycle accidents occurring in the given time 


periods. 
2. Ho: p= pz == p, =0:125 
3. H,: Hp is not true 
4, @=0.05 
5, x2 = Sy (observed cell count — expected cell count)” 
all cells expected cell count 


6. Weare told to regard the 715 accidents included in the study as a random sample from 
the population of fatal bicycle accidents. All the expected counts are greater than 5. 
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12.11 


Rw 


2 (38-89.375) (113-89.375)° 
a. eee oe ls 89875) 
Sans Samee 166.958 


a8 tdi 7 


P-value = P(y; > 166.958) = 0 


9. Since P-value ~0<0.05 we reject Hy. We have convincing evidence that fatal bicycle 
accidents are not equally likely to occur in each of the 3-hour time periods given. 


Observed Count 
Midnight to Noon 238.333 
Noon to Midnight UD 476.667 
1. Let p, and p, be the proportions of fatal bicycle accidents occurring between midnight 
and noon and between noon and midnight, respectively. 
Ay: p, =1/3, p, =2/3 
3. H,: Hp is not true 
4, a@=0.05 
eee (observed cell count — expected cell count)’ 
all cells expected cell count 
6. Weare told to regard the 715 accidents included in the study as a random sample from 
the population of fatal bicycle accidents. Both of the expected counts are greater than 5. 
7 2 _ (210 —238.333)° - (505— 476.667)" _ 5.052 
238333 476.667 
Sendis 1 
P-value = P(y; > 5.052) = 0.025 
9. Since P-value =0.025<0.05 we reject Ho. Using a 0.05 significance level, we have 
convincing evidence that fatal bicycle accidents do not occur as stated in the hypothesis. 
Observed Count 
18-34 
35-64 
Let p,, P>, and p, be the proportions of lottery ticket purchasers who fall into the given age 
catergories. 
Ho: p, = 9.35, p, =9.51, p; = 0.14 
H,: Hp is not true 
a@=0.05 
Ye (observed cell count — expected cell count)” 
Rene expected cell count 


We are told to assume that the 200 people in the study form a random sample of lottery ticket 
purchasers. All the expected counts are greater than 5. 


Bee ue (36-70) , (34-28) 
p70 28 


= 25.486 
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8. dfi=2 
P-value = P(y5 > 25.486) = 0 
9. Since P-value ~0<0.05 we reject Hy. We have convincing evidence that one or more of 
these three age groups buys a disproportionate share of lottery tickets. 
12.13 
Observed Count 
1. Let p,, ps, p;, and p, be the proportions of phenotypes resulting from the given process. 
DO Ae Deo Or =3/ 10, 2, —3/10.-p, =) 10 
3. H,: Ho is not true 
4. a=0.01 
5 X= (observed cell count — expected cell count)? 
celle expected cell count 

6. We need to assume that the plants included in the study form a random sample from the 

population of such plants. All the expected counts are greater than 5. 

2 $3 2 
7 2 _ (926 - 906.1875) ee (104 — 100.6875)" _ 1.469 
906.1875 100.6875 

8. df=3 : 

P-value = Py; > 1.469) = 0.690 
9. Since P-value = 0.690 > 0.01 we do not reject Ho. We do not have convincing evidence that 

the data from this experiment are not consistent with Mendel’s laws. 

12.15 a df=(4-1)5-l=12. P-value= PUG > 7.2)=0.844. Since the P-value is greater than 0.1, 
we do not have convincing evidence that education level and preferred candidate are not 
independent. 

b df =(4-1)(4-1)=9. P-value = P(y >14.5)=0.106. Since the P-value is greater than 
0.05, we do not have convincing evidence that education level and preferred candidate are not 
independent. 

12.17 


pas easel Body Piercings Both Body Piercing and 
Onl Onl Tattoos Art 
17 (10.327) 


Ho: Class standing and body art response are independent 
H,: Class standing and body art response are not independent 
a=0.01 

y= (observed cell count — expected cell count)’ 


all cells 


expected cell count 
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ae 


We are told to regard the sample as representative of the students at this university, so we are 
justified in treating it as a random sample from that population. All the expected counts are 
greater than 5. 

el — 49.714) ag we 57.969) 


49.714 anc gyi tee Sane 


df=9 
P-value = P(y5 > 29.507) = 0.001 


Since P-value = 0.001 <0.01 we reject Hy. We have convincing evidence of an association 
between class standing and response to the body art question. 


12.19 a Hh: Field of study and smoking status are independent 
Hi,: Field of study and smoking status are not independent 
a=0.01 
ee (observed cell count — expected cell count)’ 

all cells expected cell count 

We are told that the sample was a random sample from the population. All the expected 

counts are greater than 5. 


X* =90.853 
de= 8 
P-value = 0 


Since P-value ~0<0.01 we reject Hy. We have convincing evidence that smoking status and 
field of study are not independent. 


b_ The particularly high contributions to the chi-square statistic (in order of importance) come 
from the field of communication, languages, and cultural studies, where there was a 
disproportionately high number of smokers, the field of mathematics, engineering, and 
sciences, where there was a disproportionately low number of smokers, and the field of social 
science and human services, where there was a disproportionately high number of smokers. 


— Usually Eat Rarely Eat 
3 Meals a Day | 3 Meals a Da 


Hp: The proportions falling into the two response categories are the same for males and 


12.21 a 


females. 

H,; The proportions falling into the two response categories are not the same for males and 
females. 

@=0.05 

Y= (observed cell count — expected cell count)” 


Fecasia expected cell count 
We are told to assume that the samples of male and female students were random samples 
from the populations. All the expected counts are greater than 5. 
2 of 2 
Me: (26—21.755) Sy. eke (54 -—49.755) 7314 
PALS) 49.755 


df= 1 
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P-value = P(y; > 2.314) =0.128 

Since P-value =0.128> 0.05 we do not reject Ho. We do not have convincing evidence that 
the proportions falling into the two response categories are not the same for males and 
females. 


b_ Yes. 


e Yes. Since P-value =0.127>0.05 we do not reject Ho. We do not have convincing evidence 
that the proportions falling into the two response categories are not the same for males and 
females. 


d_ The two P-values are almost equal, in fact the difference between them is only due to 
rounding errors in the MINITAB program. In other words, if complete accuracy had been 
maintained throughout, the two P-values would have been exactly equal. (Also, the chi- 
square statistic in Part (a) is the square of the z statistic in Part (c).) It should not be surprising 
that the P-values are at least similar, since both measure the probability of getting sample 
proportions at least as far from the expected proportions, given that the proportions who 
usually eat three meals per day are the same for the two populations. 


12.23 a 


|__| Donation _| No Donation _| 
691 (527.919) 


Ho: The proportions falling into the two donation categories are the same for all three gift 
treatments. 

H,: The proportions falling into the two donation categories are not the same for all three gift 
treatments. 

a=0.01 


yr a Sy (observed cell count — expected cell count)” 


all cells 
We are told that the three treatments were assigned at random. All the expected counts are 
greater than 5. 
2 (397-514.512) ee (2656 —2819.081)° 
514.512 2819.081 


expected cell count 


= 96.506 


df=2 

P-value = P( 5 > 96.506) = 0 

Since P-value ~0<0.01 we reject Hy. We have convincing evidence that the proportions 
falling into the two donation categories are not the same for all three gift treatments. 


b The result of Part (a) tells us that the level of the gift seems to make a difference. Looking at 
the data given, 12% of those receiving no gift made a donation, 14% of those receiving a 
small gift made a donation, and 21% of those receiving a large gift made a donation. (These 
percentages can be compared to 16% making donations amongst the expected counts.) So it 
seems that the most effective strategy is to include a large gift, with the small gift making 
very little difference compared to no gift at all. 
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12.25 


dents = ow Alcohol Exposure Grou 


1 2 3 4 
Excellent LO C7 O2SyaNe 93 7925) 49 (79.25) 65°(7/9:-25) 


328 (316) | 325(316) | 316(316) | 295 (316) 
Average/Poor | 239 (281.75) | 259 (281.75) | 312 (281.75) | 317 (281.75) 


fo: Alcohol exposure and school performance are independent 
Hf, Alcohol exposure and school performance are not independent 


School 
Performance 


a=0.05 
yoo >» (observed cell count — expected cell count)’ 
vleelle expected cell count 


We are told to regard the sample as a random sample of German adolescents. All the expected 
counts are greater than 5. 


110-7925) - : 
(1079.25) ere 281.75) 


= 46.515 
oes ZB1.79 


df =6 
P-value = P(y% > 46.515) = 0 
Since P-value ~0< 0.05 we reject Hy. We have convincing evidence of an association between 


alcohol exposure and school performance. 


12.27 


Number of Sweet Drinks 
Consumed per Day Yes 


a a ee aa) 


Ho: Number of sweet drinks consumed per day and weight status are independent 
H,: Number of sweet drinks consumed per day and weight status are not independent 
a=0.05 


yee (observed cell count — expected cell count)” 


all cells 
We are told to regard the sample as representative of 2- to 3-year-old children, so we are justified 
in treating it as a random sample from that population. All the expected counts are greater than 5. 
_ (22-28.921) sete (3390 —3385.915)° 


28.921 338 9.915 


expected cell count 


xX = 3.030 


df =3 

P-value = P(y; > 3.030) = 0.387 

Since P-value = 0.387 > 0.05 we do not reject Hy. We do not have convincing evidence of an 
association between whether or not children are overweight after one year and the number of 
sweet drinks consumed. 
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Fiavgh avail sya iaade <a Citys ak eiaeaigaaee 
Type Concord Hills Francisco 
) : 


88 (84.511) | 123 (101.553) | 142 (166.936) 
| Large | 24 (12.689) | 18 (15.247) | 11 (25.064) 


Hp: City of residence and vehicle type are independent 
H,; City of residence and vehicle type are not independent 
a=0.05 


Yeu (observed cell count — expected cell count)” 


all cells 
We are told to regard the sample as a random sample of Bay area residents. All the expected 
counts are greater than 5. 
> _ (68-89.060)° ae — 25.064)" 
89.060 25.064 


expected cell count 


= 49.813 


df=6 

P-value = P(y2 > 49.813) = 0 

Since P-value ~0< 0.05 we reject Hp. We have convincing evidence of an association between 
city of residence and vehicle type. 


12.31 


Mets Pee SBE View ST | 


Three- 
Sex ID Front | Profile | Quarter 


Ho: The proportions of correct sex identifications are the same for all three nose views. 
H,: The proportions of correct sex identifications are not the same for all three nose views. 
a =0.05 


yu (observed cell count — expected cell count)” 
all cells 
We need to assume that the students were randomly assigned to the nose views. All the expected 
counts are greater than 5. 
2 2 
se ORL rae Curae 
26 14 


expected cell count 


xX? 
df=2 
P-value = P(y5 > 1.978) = 0.372 


Since P-value = 0.372 > 0.05 we do not reject Hp. We do not have convincing evidence that the 
proportions of correct sex identifications are not the same for all three nose views. 


=1.978 
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N73 


12-3328 


12.35 


RW N — 


The number of men in the sample who napped is 744(0.38) = 282.72, which we round to 2 


83, 


since the number of men who napped must be a whole number. The number of men who did 
not nap is therefore 744-283 =461. The observed frequencies for the women are calculated 


ina similar way. (The table below also shows the expected frequencies in parentheses.) 


|__| Napped_| Did Not Nay 

Men | 283 (257) | 461 (487) 
Women | 231 (257) | 513 (487 
Ho: Gender and napping are independent 


H,: Gender and napping are not independent 
a=0.01 


Fm si ese cece eee rest cell Count) 5 


744 
744 


aia expected cell count 


We are told that the sample was nationally representative, so we are justified in treating it as a 


random sample from the population of American adults. All the expected counts are greate 
than 5. 


_ (283-257) soe gulen tS 2EIBy 
DA 487 


oe 


df= 1 
P-value = P(y, > 8.034) = 0.005 


= 8.034 


r 


Since P-value = 0.005 < 0.01 we reject Hp. We have convincing evidence of an association 


between gender and napping. 


Yes. We have convincing evidence at the 0.01 significance level of an association between 


gender and napping in the population. This is equivalent to saying that we have convincing 
evidence at the 0.01 significance level that the proportions of men and women who nap are 
different (a two-tailed test of a difference of the proportions). Thus, converting this to a one- 


tailed test, since in the sample the proportion of men who napped was greater than the 
proportion of women who napped, we have convincing evidence at the 0.005 level that a 
greater proportion of men nap than women. 


Count Count 
Sunda 14 14.286 


Let p,,...,p, be the proportions of all fatal bicycle accidents occurring on the seven days. 
Hg Np = Pp =" = 27 =1/7 

H,: Hp is not true 

A005 
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eae yy (observed cell count — expected cell count)” 
satnena expected cell count 
6. Weare told that the 100 accidents formed a random sample from the population of fatal 
bicycle accidents. All the expected counts are greater than 5. 
2 (14-14.286) pee g G5 = 14.286)" 


14.286 14.286 


= 1.08 


8. df=6 
P-value = P( yz > 1.08) = 0.982 

9. Since P-value = 0.982 >0.05 we do not reject Hy. We do not have convincing evidence that 
the proportion of accidents is not the same for all days of the week. 


12.37 


Italy | 600 (400) | 140 (222) | 140 (244) | 90 (90) | 30 (44) | 


Ho: The proportions falling into the response categories are all the same for all five countries. 
H,: The proportions falling into each of the response categories are not all the same for all five 


countries. 

Oe 0.0) 

Sy oes (observed cell count — expected cell count)’ 
all cells expected cell count 


We are told that the samples were random samples from the populations. All the expected counts 
are greater than 5. 


2 2 
NN ae AU oe en Cale ie 
400 44 


X = 881.360 


df= 16 

P-value = P( 7, > 881.360) = 0 

Since P-value ~0< 0.01 we reject Ho. We have convincing evidence that the response 
proportions are not all the same for all five countries. 


12.39 
1. Let p,, Py, p;, and p, be the proportions of homicides occurring in the four seasons. 
2. Ho: py =-:= py =9.25 
3. H,: Hp is not true 
4. a=0.05 
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5 2a (observed cell count — expected cell count)’ 
Alcale expected cell count 
6. We need to assume that the 1361 homicides form a random sample from the population of 
homicides. All the expected counts are greater than 5. 
_ (328-340.25)’ (327 — 340.25) 


Bb yp Sr ae eee oe 
340.25 340.25 aR 


a ans Fs 


Seedt= 3 
P-value = P(y; > 4.035) = 0.258 


9. Since P-value = 0.258 > 0.05 we do not reject Ho. We do not have convincing evidence that 
the homicide rate is not the same over the four seasons. 


12.41 


aad 
Position | Chase in Chase 


Ho: Position and role are independent 
H,: Position and role are not independent 


a=0.01 
i= (observed cell count — expected cell count)” 
allieells expected cell count 


Each of the 183 observations in the sample is a particular lioness on a particular hunt. 
(Presumably several observations could have been gathered for a single lioness, each for a 
different hunt. Likewise, several observations could have been gathered from a single hunt, each 
for a different lioness.) We need to assume that these 183 observations form a random sample of 
lioness-hunts. All the expected counts are greater than 5. 


> _ (28-39.038) ‘Snee (41—52.038)° 
39.038 52.038 


= 10.976 


di=1 

P-value = P(y; > 10.976) = 0.001 

Since P-value = 0.001 <0.01 we reject Hy. We have convincing evidence of an association 
between position and role. 


The required assumption is given above. 


12.43 


Hy: Response (agree/disagree) and region of residence are independent 
H,; Response (agree/disagree) and region of residence are not independent 
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a=0.01 
y2 =F {observed cell count = expected cell count)” 
Biveetie expected cell count 
We are told that the sample was a random sample of adults. All the expected counts are greater 


than 5. 

2 — (130=150.350)" GT O9 121) e985 
150.350 69.121 

df=3 


P-value = P(y; > 22.855) = 0 
Since P-value ~0< 0.01 we reject Hy. We have convincing evidence of an association between 
response and region of residence. 


Astrological | Observed 
Sign Count Count 


12.45 a 


1. Let p,,...,p), be the proportions of male insured drivers born under the twelve 
astrological signs. 


fy: Dy = Pz =" = Py = 1/12 
3. H,: Ho is not true 
A4neo =0:05 
eave (observed cell count — expected cell count)? 


ellania expected cell count 


6. Weare told to treat the male policyholders of this company as a random sample of male 
insured drivers from Australia. All the expected counts are greater than 5. 


Sep EMclths ba Sloe, Gs AC ISLE CEES ae 7 
38347.333 38347.333 


MOE 


8. df=11 
P-value = P(y;, > 8216.476) = 0 

9. Since P-value ~0<0.05 we reject Hy. We have convincing evidence that the proportions 
of male insured drivers for the twelve astrological signs are not all equal. 


b This could occur if the birthrate is higher for the time of year designated as “Capricorn” than 
it is for other times of the year. 
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¢ The total number of policyholders listed in the first table is 460168. Therefore, for example, 
the proportion of policyholders born under Aquarius is 35666/460168 . The total number of 
claims listed in the second table is 1000. So, if the numbers of claims were in proportion to 
the numbers of policyholders, then we would expect the number of claims for policyholders 
born under Aquarius to be 1000(35666/460168) = 77.506. This is the expected count for 


Aquarius, and the other expected counts are calculated in a similar way. 


Astrological | Observed | Expected 
Sign Count Count 


Aquarius 
Aries 
Cancer 


Capricorn 


85 77.506 
Gemini 83 80.794 
Neer Le Er a ge ead 


1. Let p,,...,p)) be the proportions of all claims for drivers born under the 12 astrological 


signs. 
2. Hs P, = 35666/460168, p, =37926/460168, and so on, using the data given in the first 
table. 
a Hoos notre 
4, a@=0.05 
5. y= SF Lobserved cell count = expected cell count)” 
all'Gells expected cell count 


6. Weare told that this is a random sample of claims for this company. All the expected 
counts are greater than 5. 


2 _ (85-77.506)" em (81—81.966)° 
06 81.966 


= 10.748 


$. df= Fil 
P-value = P(y;, > 10.748) = 0.465 

9. Since P-value = 0.465 > 0.05 we do not reject Hj. We do not have convincing evidence 
that the proportions of claims submitted by drivers born under the twelve astrological 
signs are not equal to the corresponding proportions of policyholders. 
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Chapter 13 
Simple Linear Regression and Correlation: Inferential Methods 


Note: In this chapter, numerical answers to questions involving the normal, t, and chi square distributions 
were found using values from a calculator. Students using statistical tables will find that their answers 
differ slightly from those given. 


13.1 a y=-5.04+0.017x 


b When x =1000, y=—5+0.017(1000) = 12. 
When x =2000, y =—5+0.017(2000) = 29. 


ay, 
45 
40 
35 


-5 1000 2000 3000 


c¢ When x=2100, y=-5+0.017(2100)=30.7. The mean gas usage for houses with 2100 
square feet of space is 30.7 therms. 


d 0.017 therms 
e 100(0.017) = 1.7 therms 


f No. The given relationship only applies to houses whose sizes are between 1000 and 3000 
square feet. The size of this house, 500 square feet, lies outside this range. 


13.3. a When x=15, “, =0.135+0.003(15) = 0.18 micrometers 
When x=17, £4, =0.135+0.00307) = 0.186 micrometers 


b When x=15, “,=0.18, so P(y>0.18)=0.5. 


e When x=14, “, =0.135+0.003(14) = 0.177, 


7520477 
so P(y > 0.175) =P z> CARAT) - poe > -0.4) = 0.655 
0.005 
478=0n7 
P(y<0.178)=P( <A BOAT) = P(2-<0.2)= 0.879 


19 
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13.5. a Average change in price associated with one extra square foot of space = $47. 
Average change in price associated with 100 extra square feet of space = 100(47) = $4700. 


b When x =1800, 2, = 23000 + 47(1800) = 107600. 


110000 — 107600 


So P(y>110000)=P{ z> ]=P>048)=0.316. 


5000 
P(y <100000) = P 2 OU TSOY | = Pe <-1.52) = 0.064 
5000 
[se ee ee eat 


SSTo 0.356 


b_ A point estimate of o is s, = aS = | = 0.155. This is a typical deviation of a 
n — 


bone mineral density value in the sample from the value predicted by the least-squares line. 


c 0.009 g/cm’. 
d When x=60, estimate of mean BMD = 0.558 + 0.009(60) = 1.098 g/em?. 


SSResid ___2620.57 
SSTo 22398.05 


13.9 a The required proportion is r* =1— = 0.883. 


Resi 4 
b s,= pases = = oe = = 13.682. The number of degrees of freedom associated with 
nl — 


this estimate is n—2=16-—2=14. 


13.11 a 


0.04 0.05 0.06 0.07 0.08 0.09 0.10 0.11 


The plot shows a linear pattern, and the vertical spread of points does not appear to be 
changing over the range of x values in the sample. If we assume that the distribution of errors 
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at any given x value is approximately normal, then the simple linear regression model seems 
appropriate. 


bj =-0.00227 +1.247x 
When x =0.09, $ =—0.00227 +1.247(0.09) = 0.110. 


c os = 0.436. This tells us that 43.6% of the variation in market share can be explained by the 
linear regression model relating market share and advertising share. 


: [SSR id | 
d_ A point estimate of o is s, = = ak “a = 0.026. The number of degrees of 
nh _— 


freedom associated with this estimate is n-2=10—2=8. 


13.13 a S,,=) (x-x) =(5-15) +--+ (25-15)? =250. So pen pa SEL 


Ss 4/250 


b Now S$, =2(250)=500. So o, === = ——= = 0.179. No, o;, is not half of what it was in 
Part (a). 


¢ Four observations should be taken at each of the x values, since then S,, would be 
multiplied by 4, and so 0, would be divided by 2. To verify, S,. = 4(250) =1000, so 


co 4 
Oo, = = — 
fo Sek 1000 


Se eae [SSResid _ TN yi 
n—-2 13 
erty 9 


749 _ 


ree oh 0.154 
i da S een 402402 


= 0.126 = (1/2) (0.253). 


b We must assume that the conditions for inference are met. 
df= 13. 
The 95% confidence interval for f is 


b+(t critical value): s, =2.5+(2.160)(0.154) = (2.168,2.832). 


We are 95% confident that the slope of the population regression line relating hardness of 
molded plastic and time elapsed since the molding was completed is between 2.168 and 
2.832. 


ec Yes. Since the confidence interval is relatively narrow it seems that @ has been somewhat 


precisely estimated. 
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13,17 2 Sapa (2.*)(2.») as 44194 e943 Ves 
“ik 
San De =150- (Soy =O) 
A n 20 
Gan 25, par = 835.25 
20 , 20 
iS eras 
The slope of the population regression line is estimated by ) = > = x = 97.26, 
The y intercept of the population regression line is estimated by a = y — bx 
= 835.25 —97.26(2.5) = 592.1. 
b When x=2, y=a+ bx =592.1+97.26(2) = 786.62. 
Residual = y — y = 757 — 786.62 = —29.62. 
¢ We require a 99% confidence interval for 2 . 
We must assume that the conditions for inference are met. 
SSResid = }y’ -—a) yb) xy = 1419423 1 — 592.1(16705) —97.26(44194) = 4892.06. 
= Sa ps2 06 = 16.486. 
n—- 
ge Ss Areca 
Senge os 
df= 18. 
The 99% confidence interval for f is 
b+(¢ critical value): s, = 97.26 + (2.878)(3.297) = (87.769, 106.751) 
We are 99% confident that the slope of the population regression line relating amount of 
oxygen consumed and time spent exercising is between 87.769 and 106.751. 
13.19 1. $=slope of the population regression line relating brain volume change with mean 
childhood blood lead level. 
Zag P=0 
ch deh yee Al 
4, a=0.05 
aay b—(hypothesized value) _b—0 
i Sh Sh 
6. Weare told to assume that the basic assumptions of the simple linear regression model are 
reasonably met. 
7. t=-3.66 
8. P-value ~0 
9. Since P-value =~0<0.05 we reject Ho. We have convincing evidence that the slope of the 


population regression line relating brain volume change with mean childhood blood lead 
level is not equal to zero, that is, that there is a useful linear relationship between these two 
variables. 
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13.21 a For the data given, b=0.140, s, = 0.402, Ss, =0.026. 
The data are plotted in the scatterplot below. 


Pleasantness Rating 


20 yp 24 26 28 30 32 34 36 
Firing Frequency 


The plot shows a linear pattern, and the vertical spread of points does not appear to be 
changing over the range of x values in the sample. If we assume that the distribution of errors 
at any given x value is approximately normal, then the simple linear regression model seems 
appropriate. 


df = 8. The 95% confidence interval for { is 
b+(¢ critical value)-s, =0.140+(2.306)(0.026) = (0.081,0.199). 


We are 95% confident that the mean change in pleasantness rating associated with an 
increase of 1 impulse per second in firing frequency is between 0.081 and 0.199. 


b 1. £=slope of the population regression line relating pleasantness rating to firing 


frequency. 
DeraHe: = 0 
ape BU 
4. @=0.05 
i See b—(hypothesized value) a 5-0 

Sp Sp 
6. The conditions for inference were checked in Part (a). 
ee 545 
0.026 

8. df=8 


P-value = 2: P(t, > 5.451) = 0.001 
9. Since P-value =0.001< 0.05 we reject Ho. We have convincing evidence of a useful 
linear relationship between firing frequency and pleasantness rating. 


13.23 a 1. #=average change in sales revenue associated with a l-unit increase in advertising 


expenditure. 
pis FA: joa 0 
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See D0 
a=0.05 
are b—(hypothesized value) = AW 
s Ss; 
6. We must assume that the conditions for inference are met. 
52.27-0 

7. t=—— =6.493 

8.05 
8. df= 13 


P-value = 2: P(t,, > 6.493) =0 

9. Since P-value ~0< 0.05 we reject Hp. We have convincing evidence that the slope of the 
population regression line relating sales revenue and advertising expenditure is not equal 
to zero. 


We conclude that there is a useful linear relationship between sales revenue and advertising 
expenditure. 


b 1. $=average change in sales revenue associated with a |-unit increase in advertising 
expenditure. 


2.) Ay: B=40 
3. He p> 40 
Ae O=0.01 
5 b—(hypothesized value) ye b= 40 
S) S), 
We must assume that the conditions for inference are met. 

ih OG prche a. =1.524 

8.05 
8. df= 13 


P-value = P(t,; > 1.524) = 0.076 


9. Since P-value = 0.076 > 0.01 we do not reject Hp. We do not have convincing evidence 
that the average change in sales revenue associated with a |-unit (that is, $1000) increase 
in advertising expenditure is greater than $40,000. 


13.25 S,,=) xy- (2 (v2.0) = 437636 SoU od 253.5225. 


Sas Baia lta 7325.15) 
n 16 


x= Mics = 42.375, Y= “ = 6.53375 


Se oes 2 
pia Raa en nya 
Dd. fa Ze 


xX 


y—bx =6.53375 —(—0.0073 1)(42.375) = 6.843. 
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SSResid =)" y’ -a)\y- bY xy = 36056 —(6.843)(104.54) — (—0.00731)(4376.36) = 0.01774 


SSResid 0.01 
$,=4/-——— = a ree 714 _ 9.03559. 


is 0.0355 
= 003550 000416. 


Ss 4 47305175 


S, = = 


1. £=average change in milk pH associated with a l-unit increase in temperature. 
Bs A: B =) 

he le Pel 

4. a=0.01 

3; 


b—(hypothesized value) 6-0 
Sp S; 


9) 


p= 


6. The data are plotted in the scatterplot below. 


0 10 20 30 40 50 60 70 80 


The plot shows a linear pattern, and the vertical spread of points does not appear to be 
changing over the range of x values in the sample. If we assume that the distribution of errors 
at any given x value is approximately normal, then the simple linear regression model seems 


appropriate. 
2 eae 
0.000416 
8. df= 


P-value = P(4, <—17.569) =0 

9, Since P-value =~ 0<0.01 we reject Ho. We have convincing evidence that the slope of the 
population regression line relating milk pH and temperature is negative. Thus the data 
strongly suggest that there is a negative linear relationship between temperature and pH. 
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Standardized Residual 


20 wy) 24 26 28 30 32 34 36 
Firing Frequency 


There are no particularly unusual features in the standardized residual plot. The only slightly 
unusual feature is the point whose standardized residual is —1.83, which is relatively far from 
zero, but not particularly extreme. The plot supports the assumption that the simple linear 
regression model applies. 


b_ Yes. Since the normal probability plot shows a roughly linear pattern we can conclude that is 
it reasonable to assume that the error distribution is approximately normal. 
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13.29 a Letting x = minimum width and y = maximum width, the least-squares regression line is 
y =0.939 + 0.873x. 


b | The residuals and the standardized residuals are shown in the table below. 


Width Widt Residual 
I 1.8 25 -0.011 
Sa Sa ne IPs 
LL ee ee eee ae 
[BOE eee Ae ea 
-0.609 
ioc -0.509 


2.9 “0.371 
5.1 0.292 


ss 
Nn 
N 
a 
i) 
—I 


18 10.2 10:2 0359 0.748 
19 3:5 365 -0.495 -0.751 


Suu Oban 
ce 
hater 
Cae eiceadd Weor23 589) 
I eral 
Baer 


; 0.577 
CE 0.233 
ey 0.077 
0.413 
: 
0.013 
-0.400 


SS 
USS 
i) 
SS 
— 
wo 


The standardized residual plot is shown below. 
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Standardized Residual 


Minimum Width 


The standardized residual plot shows that there is one point that is a clear outlier (the point 
whose standardized residual is 3.721). This is the point for product 25. 


¢ The equation of the least-squares regression line is now y =0.703+0.918x. 
A computer analysis gives s, = 0.065. Thus the change in slope from 0.873 to 0.918 
expressed in standard deviations is (0.918 —0.873)/0.065 = 0.692. Removal of the point 
resulted in a reasonably substantial change in the equation of the estimated regression line. 


d_ For every |-cm increase in minimum width, the mean maximum width is estimated to 
increase by 0.918 cm. 
The intercept would be an estimate of the mean maximum width when the minimum width is 
zero. It is clearly impossible to have a container whose minimum width is zero. 


Standardized Residual 


2.0 2 


Minimum Width 
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SSS ge ss ae aa 


13.31 


re) 


a 


The standardized residual plot for the data with the Coke bottle removed is shown above. The 
pattern in this plot suggests that the variances of the y distributions decrease as x increases, 
and therefore that the assumption of constant variance is not valid. 


Standardized Residual 
3 


50 100 150 200 250 300 350 400 
x 


There is one unusually large standardized residual, 2.52, for the point (164.2, 181). The point 
(387.8, 310) would seem to be an influential point, since removing it from the standardized 
residual plot would result in an impossible pattern for a residual plot. (The residual plot is 
likely to be similar in appearance to the standardized residual plot, and the horizontal line at 
zero should be the least-squares regression line for the residual plot. When the point 

(387.8, 310) is removed and thus the point with x coordinate 387.8 is removed from the 
standardized residual plot, the pattern shown amongst the remaining points would seem to 
result in a least-squares regression line for that plot that has a clearly positive slope.) 


Apart from the one point that has a large residual, the arrangement of points in the residual 
plot seems consistent with the simple linear regression model. 


If we include the point with the unusually large standardized residual we might begin to 
suspect that the variances of the y distributions decrease as the x values increase. However, 
from the relatively small number of points included we do not have particularly strong 
evidence that the assumption of constant variance does not apply. 


Suppose we constructed a 95% confidence interval for the mean value of y when x = x*. We 
would then be 95% confident that the mean value of y was within that interval. If we were to 
construct the 95% prediction interval at x = x* we would be 95% confident that an observed y 
value, y*, at that value of x will be within the interval. The 95% confidence level for the 
prediction interval is interpreted as follows. The prediction interval is constructed using a set of 
independent y values for a given set of x values. Imagine this being done a large number of times, 
with the prediction interval at x = x* being calculated for each set of (x,y) points. Imagine, also, 


a large number of y values being selected at x = x*. Then if one interval is chosen at random, and 
one y value is chosen at random, on average 95 times out of 100 the y value will be within the 


interval. 
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13.3588 


FS er 


13,394 ta 


We use S,,5,« = 5, + nb att I ae 
= 
Here $,,4(7,9) = 16.486 i "= 4.038. 


Since 3 is the same distance from 2.5 as is 2, 84.5(3.0) =Sa+0(2.0) = 4-038. 


92_9 5/2 
Sa4b(2.8) = 16.486,}— + = 3.817. 


The estimated standard deviation of a+bx* is smallest for x* =2.5, since the distance of 
this value from the mean value of x is zero. 


We need to assume that the conditions for inference are met. 
The point estimate of @+ {(40) is a+b(40) = 6.843345 — 0.00730608(40) = 6.55110. 


The estimated standard deviation of a+6(40) is 


9 = 2 
Relea = SER 9 3 56cldeeabants os eg 00R084T 
ay ne (onus 55 


The critical value of the ¢ distribution with 14 degrees of freedom for a 95% confidence 
interval is 2.145. ; 
So the required confidence interval is 6.551104 2.145(0.0089547) = (6.532,6.570). 


We are 95% confident that the mean milk pH when the milk temperature is 40°C is between 
6.532 and 6.570. 


We need to assume that the conditions for inference are met. 
The point estimate of @+ B(35) is a+ b(35) = 6.843345 — 0.00730608(35) = 6.58763. 


The estimated standard deviation of a+4(35) is 


ay 2 = 4 
Oe eee eC t56 apa ee = 0.0094138. 
Aiea e (6) e325 


The critical value of the ¢ distribution with 14 degrees of freedom for a 99% confidence 
interval is 2.977. 
So the required confidence interval is 6.58763 + 2.977(0.0094 138) = (6.560, 6.616). 


We are 99% confident that the mean milk pH when the milk temperature is 35°C is between 
6.560 and 6.616. 


No. Since 90°C is well outside the range of x values in the original data set, this would not be 
advisable. 


The equation of the estimated regression line is y =—0.001790—0.0021007x, where x = 
mean childhood blood lead level and y = brain volume change. 


We need to assume that the conditions for inference are met. 
The point estimate of @+ 2(20) is a+b(20) =—0.001790 — 0.0021007(20) = —0.043804. 
The estimated standard deviation of a+h(20) is 
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13.41 


1 (20-xy eal WLU 8 gs) 
s, ites 0.031 el a ame ee ey S ; 
ptrie ra GPa 0.0069979 


The critical value of the ¢ distribution with 98 degrees of freedom for a 90% confidence 


interval is 1.661. 


So the required confidence interval is —0.043804 + 1.661(0.0069979) = (-0.055, — 0.032). 


We are 90% confident that the mean brain volume change for people with a childhood blood 
lead level of 20 g/dL is between —0.055 and —0.032. 


The estimated standard deviation of the amount by which a single y observation deviates 
from the value predicted by an estimated regression line is 


2 +52, 4.6 = 0.031? + 0.0069979? = 0.03178. 


The critical value of the ¢ distribution with 98 degrees of freedom for a 90% confidence 
interval is 1.661. 


So the required confidence interval is —0.043804 + 1.661(0.03178) = (—0.097, 0.009). 


We are 90% confident that the brain volume change for a person with a childhood blood lead 
level of 20 ug/dL will be between —0.097 and 0.009. 


The answer to Part (b) gives an interval in which we are 90% confident that the mean brain 
volume change for a person with a childhood blood lead level of 20 ug/dL lies. The answer to 
Part (c) states that if we were to find the brain volume change for one person with a 
childhood blood lead level of 20 g/dL, we are 90% confident that this value will lie within 
the interval found. 


The equation of the regression line is y =—133.02+5.919x, where x = snout vent length and 
y = clutch size. 


5, =1.127. 


Yes. Since the estimated slope is positive and since the P-value is small (given as 0.000 in the 
output) we have convincing evidence that the slope of the population regression line is 
positive. 


We need to assume that the conditions for inference are met. 
The point estimate of @+ 8(65) is a+b(65) =—133.02 +5.919(65) = 251.715. 


S.= Dire — nx? = 45958 —14(56.5)? = 1266.5. 


xX 


The estimated standard deviation of a+ (65) is 


zy) a 2 
ROPER IES arog jee 3D 29 4) 151, 
Tigi Rae 14. 1266.5 


Therefore, the estimated standard deviation of the amount by which a single y observation 
deviates from the value predicted by an estimated regression line is 


So tS pps = 33.90 $12,151 = 36.012. 


The critical value of the ¢ distribution with 12 degrees of freedom for a 95% confidence 


interval is 2.179. 
So the required confidence interval is 251.715+2.179(36.012) = (173.252,330.178). 
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13.43 a 


We are 95% confident that the clutch size for a salamander whose snout-vent length is 65 will 
be between 173.252 and 330.178. 


It would not be appropriate to use the estimated regression line to predict the clutch size for a 
salamander with a snout-vent length of 105, since 105 is a long way outside the range of the x 
values in the original data set. 


The equation of the estimated regression line is p = 2.78551+0.04462x, where x = time on 
shelf and y = moisture content. 


£ = slope of the population regression line relating moisture content to shelf time. 
He =0 
H,: B#0 
a=0.05 
re b—(hypothesized value) hl 5-0 
Sp S, 

A standardized residual plot is shown below. 

Standardized Residual 


0 10 20 30 40 
Shelf Time (days) 


Apart from one outlier, the standardized residual plot shows a random pattern that is 
consistent with the simple regression model. 


2 
x 2 
Sas ix? (a) 7445-299) = 9976 .357, 
n 14 
s, = 0.196246 
Ss 
- Is. = 0.00411 
Se 
_ 0.04462 —-0 _ 19.848 
0.00411 
dead 


P-value =2- P(t, > 10.848) ~ 0 
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Since P-value ~0< 0.05 we reject Hy. We have convincing evidence that the simple 


regression model provides useful information for predicting moisture content from 
knowledge of shelf time. 


¢ The conditions for inference were checked in Part (b). 
The point estimate of a+ 8(30) is a+ b(30) =2.78551+0.04462(30) = 4.124. 


Soe = x? =n? = 7745 -14(269/14)? = 2576.357. 
S, =0.196246. 
The estimated standard deviation of a+5(30) is 


1 G0-xy = ‘ 
I ood oe ee ee NG 7006. 
yinaa ws ee 2576057 


Therefore, the estimated standard deviation of the amount by which a single y observation 
deviates from the value predicted by an estimated regression line is 


52 +52, 5e =V 0.196246? + 0.067006? = 0.207. 


The critical value of the ¢ distribution with 12 degrees of freedom for a 95% confidence 
interval is 2.179. 

So the required confidence interval is 4.124 +2.179(0.207) = (3.672, 4.576). 

We are 95% confident that the moisture content for a box of cereal that has been on the shelf 
for 30 days will be between 3.672 and 4.576 percent. 


d_ Since 4.1 is included in the confidence interval constructed in Part (c), a moisture content 
exceeding 4.1 percent is quite plausible when the shelf time is 30 days. 


13.45 a_ The scatterplot in Example 5.2 shows a linear pattern that is consistent with the assumptions 
of the simple linear regression model. 
The point estimate of @+ (0.5) is a+b(0.5) =—1.59 + 2.59(0.5) = —0.295. 


,, = [SSResid _ [1936 _y 054 
n—2 30 


The estimated standard deviation of a+5(0.5) is 


ey) ps 2 
eA LOE TEs hry pe cee NOUS evi 
Vn aes 32 1.479 


The critical value of the ¢ distribution with 30 degrees of freedom for a 95% confidence 
interval is 2.042. 

So the required confidence interval is —0.295 + 2.042(0.05015) = (-0.397,—0.193). 

We are 95% confident that the mean perceived astringency score when the tannin 
concentration is 0.5 is between —0.397 and —0.193. 


b Weneed 95% confidence intervals for the mean astringency ratings at both x values. We 
already have the confidence interval for x = 0.5, so we only need to calculate the interval for 
7 = Uae 
As stated in the solution to Part (a), the scatterplot in Example 5.2 shows a linear pattern that 
is consistent with the assumptions of the simple linear regression model. 
The point estimate of a@+ {(0.7) is a+ (0.7) =—-1.59 + 2.59(0.7) = 0.223. 


ae SSResid e 1.936 = 0.254. 
‘ \ n—-2 VY 30 
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The estimated standard deviation of a+5(0.7) is 


fe fe Ng ong ee ae OAR CA Boel 
\n Sy 32 1.479 


The critical value of the ¢ distribution with 30 degrees of freedom for a 95% confidence 


interval is 2.042. 
So the required confidence interval is 0.223 + 2.042(0.04894) = (0.123,0.323). 


We are 95% confident that the mean perceived astringency score when the tannin 
concentration is 0.7 is between 0.123 and 0.323. 


The simultaneous confidence level would be [100 —2(1)]% =98%. 


d= The simultaneous confidence level would be [100 — 3(5)]% = 85%. 


13.47 The statistic r is the correlation coefficient for a sample, while denotes the correlation 


coefficient for the population. 


13.49 1. 


mw Rw 


p =the correlation between teaching evaluation index and annual raise for the population 
from which the sample was selected. 

HM: p=9 

Hy; p#0 

a=0.05 


ie 
je 


n—2 
We must assume that the variables have a bivariate normal distribution and that the sample 
was a random sample from the population. 


ep Oe 
1-011 
\ 351 
df= 351 


P-value = 2- P(t,;, > 2.073) = 0.039 
Since P-value = 0.039 < 0.05 we reject Hp. We have convincing evidence of a linear 
association between teaching evaluation index and annual raise. 


This result might be initially surprising, since 0.11 seems to be a relatively small value for the 
sample correlation coefficient. However, what the result shows is that for a sample size as large 
as 353, a sample correlation as large as 0.11 would be very unlikely if the population correlation 
were zero. 


Ie pH be 


1. p =the correlation between time spent watching television and grade point average for 
the population from which the sample was selected. 
. Hh: p=9 
ay ep ea 
a=0.01 
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r 
ae oe 
n—-2 


6. We must assume that the variables have a bivariate normal distribution. We are told that 
the sample was a random sample. 


ae oS AS 


he PID 
526 


8. di=526 
P-value = P(t, < —6.175) = 0 


9. Since P-value =0<0.01 we reject Hy. We have convincing evidence of a negative 
correlation between time spent watching television and grade point average. 


b Since r* =(—0.26)’ = 0.0676, only 6.76% of the observed variation in grade point average 
would be explained by the regression line. This is not a substantial percentage. 


13.53 p =the correlation between surface and subsurface concentration. 


n—2 
6. We must assume that the sample was a random sample from the population under 
consideration. 


Surface 


ail eS) -1.0 -0.5 0.0 0.5 1.0 eS 
Normal Score 
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Subsurface 


-1.5 -1.0 -0.5 0.0 0.5 1.0 135 
Normal Score 


The curved pattern in the first normal probability plot tells us that it is unlikely that the 
variables have a bivariate normal distribution, but we will nevertheless proceed with the 


hypothesis test. 
7. r=0.574 
a 05741 ass 
[10.5747 
a 
8, dfi=7 


P-value = 2: P(t, > 1.855) = 0.106 
9. Since P-value =0.106>0.05 we do not reject Hp. We do not have convincing evidence of a 
linear relationship between surface and subsurface concentration. 


13.55 a The slope of the estimated regression line for y = verbal language score against x = height 
gain from age 11 to 16 is 2.0. This tells us that for each extra inch of height gain the average 
verbal language score at age 11 increased by 2.0 percentage points. The equivalent results for 
nonverbal language scores and math scores were 2.3 and 3.0. Thus the reported slopes are 
consistent with the statement that each extra inch of height gain was associated with an 
increase in test scores of between 2 and 3 percentage points. 


b_ The slope of the estimated regression line for y = verbal language score against x = height 
gain from age 16 to 33 is —3.1. This tells us that for each extra inch of height gain the average 
verbal language score at age | 1 decreased by 3.1 percentage points. The equivalent results for 
nonverbal language scores and math scores were both —3.8. Thus the reported slopes are 
consistent with the statement that each extra inch of height gain was associated with a 
decrease in test scores of between 3.1 and 3.8 percentage points. 


c Between the ages of 11 and 16 the first boy grew 5 inches more than the second boy. So the 
first boy’s age 11 math score is predicted to be 5-3=15 percentage points higher than that of 
the second boy. Between the ages of 16 and 33 the second boy grew 5 inches more than the 
first boy. According to this information the first boy’s age 11 math score is predicted to be 
5-3.8=19 percentage points higher than that of the second boy. These two results are 
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13.57, 


13,59 


consistent with the conclusion that on the whole boys who did their growing early had higher 
cognitive scores at age 11 than those whose growth occurred later. 


r ps —0.18 


= a = - 3.399, 
ae Teco is 
Ue? 345 


Thus, for a two-tailed test, the P-value is 2- P(t,,; <—3.399) =0.001. Since the P-value for a 


one-tailed test would be a half of this, it is indeed correct, whether this be a one- or two-tailed 
test, that P-value < 0.05. 


Yes. One would expect, generally speaking, that those with greater coping humor ratings 
would have smaller depression ratings. 


No. Since r? =(—0.18)° = 0.0324, we know that only 3.2% of the variation in depression 


scale values is attributable to the approximate linear relationship with the coping humor scale. 
So the linear regression model will generally not give accurate predictions. 


1. p =the correlation between soil hardness and trail length for the population of penguin 


burrows. 
De Ab: p=90 
a gel <0 
Ae 0.05 
§. {= 
l-r 
rey 


6. We must assume that the variables have a bivariate normal distribution and that the 
sample was a random sample of penguin burrows. 


7. r= 0.386 =—0.621. (We know that r<0 since the slope of the least-squares line is 


negative.) 
1—(-0.621)° 
59 
Sadi 9 


P-value = P(ts) < —6.090) = 0 
9, Since P-value ~0<0.05 we reject Hy. We have convincing evidence of a negative 
correlation between soil hardness and trail length. 


We need to assume that the conditions for inference are met. 
The point estimate of a+ £(6.0) is a+ 5(6.0)=11 .607 —1.4187(6.0) = 3.0948. 


The estimated standard deviation of a+6(6.0) is 


ey) a 2 
Pe Ute yg sith (00 TA De sey a 79) 
Aaa i A 


Therefore, the estimated standard deviation of the amount by which a single y observation 
deviates from the value predicted by an estimated regression line is 
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{Sot Sone = VN 280 40.374 = 2.380. 


The critical value of the / distribution with 59 degrees of freedom for a 95% confidence 
interval is 2.001. 
So the required prediction interval is 3.0948 + 2.001(2.380) = (-1.667,7.856). 


We are 95% confident that the trail length when the soil hardness is 6.0 will be between 
—1.667 and 7.856. 


¢ No. For x=10 the least-squares line predicts y =—2.58. Since it is not possible to have a 


negative trail length, it is clear that the simple linear regression model does not apply at 
x=10. So the simple linear regression model is not suitable for this prediction. 


13.61 a 1. $=slope of the population regression line relating x = age and y = percentage of the 
cribriform area of the lamina scleralis occupied by pores. 
Sr Hee par0.5 
3. Ay P#-0.5 
4. a0 
pee b-—(hypothesized value) a b—(-0.5) 
5) 


b Sp 
6. The data are plotted in the scatterplot below. 


y 


20 30 40 50 60 70 80 


The plot shows a linear pattern, and the vertical spread of points does not appear to be 
changing over the range of x values in the sample. If we assume that the distribution of 
errors at any given x value is approximately normal, then the simple linear regression 
model seems appropriate. 


7. b=-0.447. 
5, = 6.75598 
S., =3797.529 
s 6.75598 

(oe 80.1096 
tie AS ents 37971529 
2 OATES i 

0.1096 
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8. df=15 
P-value = 2: P(t,, > 0.488) = 0.633 
9. Since P-value = 0.633 >0.1 we do not reject Hy. We do not have convincing evidence 


that the average decrease in percentage area associated with a l-year age increase is not 
0.5. 


b As shown in the solution to Part (a), the scatterplot shows a linear pattern that is consistent 
with the assumptions of the simple linear regression model. 


The point estimate of @+ 6(50) is a+b(50) = 72.918 —0.447(50) = 50.591. 
The estimated standard deviation of a+b(50) is 


a) 2 
‘, 1 Q0=-x)" _ 6 75598 ple CoD eT AS 24 iinoerazaiey 
Loewe Vi7-3797.529 


The critical value of the ¢ distribution with 15 degrees of freedom for a 95% confidence 
interval is 2.131. 


So the required confidence interval is 50.591+2.131(1.649) = (47.076, 54.106). 
We are 95% confident that the mean percentage area at age 50 is between 47.076 and 54.106. 


13.63 For leptodactylus: 
SSResid = 0.30989 
Sample size = 9 
b=0.31636 
S,, = 42.82 


For bufa: 

SSResid = 0.12792 
Sample size = 8 
b= 035978 

S = 34.54875 


= 0.03368 
76 
Toe 1p) 
Hp. 2 Ds 
a@=0.05 
po Sl ee Eaten tay 
s&s? [0.03368 , 0.03368 
SS, V 42.82 © 3454875 
df = 13 


P-value = 2- P(t, < —1.03457) =0.320 
Since P-value = 0.319 >0.05 we do not reject Ho. We do not have convincing evidence that the 
slopes of the population regression lines for the two different frog populations are not equal. 


© 2012 Cengage Learning. All Rights Reserved, May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. 


200 Chapter 13: Simple Linear Regression and Correlation: Inferential Methods 


13.65 Ifthe point (20, 33000) is not included, then the slope of the least-squares line would be relatively 
small and negative (appearing close to horizontal when drawn to the scales of the scatterplot 
given in the question). If the point is included then the slope of the least-squares line would still 
be negative, but much further from zero. 


13.67 The small P-value indicates that there is convincing evidence of a useful linear relationship 
between percentage raise and productivity. 


13.69 a_ The values e,,...,e, are the vertical deviations of the y observations from the population 
regression line. The residuals are the vertical deviations from the sample regression line. 


b False. The simple linear regression model states that the mean value of y is equal to @+ fx. 


¢ No. You only test hypotheses about population characteristics; b is a sample statistic. 

d_ Strictly speaking this statement is false, since a set of points lying exactly on a straight line 
will give a zero result for SSResid. However, it is certainly true to say that, since SSResid is a 
sum of squares, its value must be nonnegative. 


e This is not possible, since the sum of the residuals is always zero. 


f This is not possible, since SSResid (here said to be equal to 731) is always less than or equal 
to SSTo (here said to be 615). 
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Cumulative Review Exercises 


CR13.1 


Randomly assign the 400 students to two groups of equal size, Group A and Group B. (This can 
be done by writing the names of the students onto slips of paper, placing the slips into a hat, and 
picking 200 at random. These 200 people will go into Group A, and the remaining 200 people 
will go into Group B.) Have the 400 students take the same course, attending the same lectures 
and being given the same homework assignments. The only difference between the two groups 
should be that the students in Group A should be given daily quizzes and the students in Group B 
should not. (This could be done by having the students in Group A take their quizzes in class after 
the students in Group B have been dismissed.) After the final exam the exam scores for the 
students in Group A should be compared to the exam scores for the students in Group B. 


CR13.3 
a Median = 2 
Lower quartile = 1.5 
Upper quartile = 6.5 
IOR=6.5 = 15 =5 


Ut 


el ee ee es Lanne ee i as es | a ees 


0 10 20 30 40 
Number of Fines 


Two of the observations, 23 and 36, are (extreme) outliers. 


b The two airlines with the highest numbers of fines assessed may not be the worst in terms of 
maintenance violations since these airlines might have more flights than the other airlines. 


CR13.5 
a Check of Conditions 

1. Since np =1003(0.68) = 682 = 10 and n(1— p) =1003(0.32) =321 210, the sample size 
is large enough. 

2. The sample size of n = 1003 is much smaller than 10% of the population size (the number 
of adult Americans). 

3. Weare told that the survey was nationally representative, so it is reasonable to regard the 
sample as a random sample from the population of adult Americans. 

Calculation 

The 95% confidence interval for p is 


06 (OEE? 0 6841.96 a = (0.651,0.709). 
n 


Interpretation 
We are 95% confident that the proportion of all adult Americans who view a landline phone 


as a necessity is between 0.651 and 0.709. 


b 1. p=proportion of all adult Americans who considered a TV set a necessity 
2,. Tgp = 05 
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fe Feige) i) 
a=0.05 
pop p=0a 


p(l-p) [(0.5)(0.5) 
n 1003 
The sample was nationally representative, so it is reasonable to treat the sample as a 
random sample from the population. The sample size is much smaller than the population 
size (the number of adult Americans). Furthermore, np = 1003(0.5)=501.5 210 and 
n(1— p)=1003(0.5) = 501.5 >10, so the sample is large enough. Therefore the large 


sample test is appropriate. 
0520.50) 


SS = 1.267 
[(0.5)(0.5) 
1003 


P-value = P(Z >1.267) =0.103 
Since P-value = 0.103 > 0.05 we do not reject Hp. We do not have convincing evidence 
that a majority of adult Americans consider a TV set a necessity. 


-— 
a 


P, = proportion of adult Americans in 2003 who regarded a microwave oven as a 


necessity 

P> = proportion of adult Americans in 2009 who regarded a microwave oven as a 
necessity 

Fo: p, — Pp, =9 

Ap; —p, = 9 

a@=0.01 


DA.) , PAI-P.) 
ny Ny 


We are told that the 2009 survey was nationally representative, so it is reasonable to treat 
the sample in 2009 as a random sample from the population. We need to assume that the 
sample in 2003 was a random sample from the population. Also, 

n, P, = 1003(0.68) = 682 210, n,(1— p,) =1003(0.32) = 321 > 10, 

N; P, = 1003(0.47) = 471210, and n,(1— p,) =1003(0.53) = 532 > 10, so the samples are 
large enough. 

pie np, +n,p, _ 1003(0.68) + 1003(0.47) 


E SS nee Ss (5.75 
n, +n, 1003 + 1003 


0.68 —0.47 


ley 
[(0.575)(0.425) in (0.575)(0.425) 
1003 1003 


P-value = P(Z > 9.513) =0 
Since P-value ~0< 0.01 we reject Hy. We have convincing evidence that the proportion 


of adult Americans who regarded a microwave oven as a necessity decreased between 
2003 and 2009. 


ole 


© 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. 


Chapter 13: Simple Linear Regression and Correlation: Inferential Methods 203 


ee 


CR13.7 
a P(x=0)=1-0.38=0.62. 


b  P(2<x<5)=0.5(0.38)=0.19. 
P(x > 5) =0.18(0.38) = 0.0684. 
So P(x =1) =0.38-0.19— 0.0684 = 0.1216. 


ec 0.19 

d 0.0684 
CR13.9 

a 


Number of Songs 


0 10 20 30 40 
Number of Months 


Yes, the relationship looks approximately linear. 


b The equation of the estimated regression line is y =—12.887+21.126x, where x = number of 
months the user has owned the MP3 player and y = number of songs stored. 
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Standardized Residual 


0 10 20 30 40 
Number of Months 


There is a random pattern in the standardized residual plot, and there is no suggestion that the 
variance of y is not the same at each x value. There are no outliers. The assumptions of the 
simple linear regression model would therefore seem to be reasonable. 


d 1. f=slope of the population regression line relating the number of songs to the number of 


months. 
20 Lig DP =U) 
Sieh yee al, 
4. a=0.05 
A ft b—(hypothesized value) me ‘De0 


Sh Sh 
6. As explained in Part (c), the assumptions of the simple linear regression model seem to 
be reasonable. 


7. 5, =0.994 
{Ds 
0.994 
By df= 13 


P-value = 2: P(t,, > 21.263) =0 
9. Since P-value =0<0.05 we reject Hp. We have convincing evidence of a useful linear 


relationship between the number of songs stored and the number of months the MP3 
player has been owned. 


CR13.11 


a as as ae eek Car es 
409 (375.647) 
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SS ec a 


Ao: The proportions falling in each of the response categories are the same for the three years. 
F,: Hp is not true. 


a=0.05 
x2 = Sy (observed cell count — expected cell count)? 
all Gans expected cell count 
The samples were considered to be representative of the populations of undergraduates for the 


given years, and so it is reasonable to assume that they were random samples from those 
populations. All the expected counts are greater than 5. 


x? = 397 =379.706)" |, (48-51.813) 
379.706 51.813 


= 26.175 

di= 0 

P-value = P(y2 > 26.175) = 0 

Since P-value ~0<0.05 we reject Hp. We have convincing evidence that the distribution of 


political affiliation is not the same for all three years for which the data are given. 


CR13.13 


rae Credit Card? 
aee_[Guet (see 
g Credit Card | Card 


408 (397.895) | 115 (125.105) 
104 (96.621) | 23 (30.379) 


Hp: Region of residence and having a credit card are independent 
H,: Region of residence and having a credit card are not independent 
@=0:05 


Y= (observed cell count — expected cell count)” 


all cells 
We are told that the sample was a random sample of undergraduates in the US. All the expected 
counts are greater than 5. 
2_ (401 — 429.848) Fi Peet (23=30.379) 
429.848 30399 


expected cell count 


ie 
dio 
P-value = P(y; > 15.106) = 0.002 


Since P-value = 0.002 < 0.05 we reject Hy. We have convincing evidence that region of residence 
and having a credit card are not independent. 


= 15.106 


CR13.15 
1. 44 = mean alkalinity upstream 


jl, = mean alkalinity downstream 
Dp dig ea =U 
Bo wie Ls a0 
4, a@=0.05 
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CR13.17 


WN 
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_ (%, —%))— (hypothesized value) _ (% —x,)—(-50) 


2h 
D 


2 2 2 2 
Ribs Sree ss 
a Oe nm N 


We need to assume that the water specimens were chosen randomly from the two locations. 
We are given that n, =24 and n, =24, so neither sample size was greater than or equal to 
30. We therefore need to assume that the distributions of alkalinity at the two locations are 
approximately normal. 
a (75.9 -183.6)—(—50) _ 
Ss ee. 
+ ee 
24 24 

df = 45.752 
P-value = P(ty5 455 < —113.169) =0 
Since P-value ~0< 0.05 we reject Hy. We have convincing evidence that the mean alkalinity 
is higher downstream than upstream by more than 50 mg/L. 


—113.169 


Let p,,.-.,Pg be the proportions of homing pigeons choosing the twelve given directions. 
A: Pp; =*:*= Pg = 9.125 


H,: Hp is not true 
a=0.1 


a Sy a a 
all cells 
We need to assume that the study was performed using a random sample of homing pigeons. 
All the expected counts are greater than 5. 
2 2 
Peete to) yee (Lwin) 
15 15 


expected cell count 


X =4.8 


df=7 
P-value = P(y; > 4.8) = 0.684 


Since P-value = 0.684 > 0.1 we do not reject Hy. We do not have convincing evidence that 
the birds exhibit a preference. 
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Multiple Regression Analysis 


Note: In this chapter, numerical answers to questions involving the normal, t, chi square, and F 
distributions were found using values from a calculator. Students using statistical tables will find that 
their answers differ slightly from those given. 


14.1 An example ofa deterministic model is y= a@+ f,x, + B,x, + Bx; . This is a deterministic model, 
because, for any given values of x,, x,, and x;, the value of y is known. An example of a 
probabilistic model is y= a+ £,x, + B,x, + B,x, +e. The error term, e, is a random variable: we 
do not know what value it is going to take. Consequently, y, too, is a random variable. 


143 a 


b 


145 a 


14.7 a 


The population regression function is 30+0.90x, +0.08x, —4.50x;. 


The population regression coefficients are 0.90, 0.08, and —4.50. 


When dynamic hand grip endurance and trunk extension ratio are fixed, the mean increase in 
rating of acceptable load associated with a 1-cm increase in extent of left lateral bending is 
0.90 kg. 


When extent of left lateral bending and dynamic hand grip endurance are fixed, the mean 
decrease in rating of acceptable load associated with a 1-N/kg increase in trunk extension 
ratio is 4.50 kg. 


Mean of y = 30 +0.90(25) + 0.08(200) — 4.50(10) = 23.5 kg. 


For these values of the independent variables, the distribution of y is normal, with mean 23.5 
and standard deviation 5. We require 

13.5—23.5 33.5-23.5 
Oz << 


: : ) = P(—-2<z<2)=0.9545. 


PO3S<y<335)= P| 


When x, =20 and x, =50, mean weight = —21.658 + 0.828(20) + 0.373(50) =13.552 g. 


When length is fixed, the mean increase in weight associated with a 1-mm increase in width 
is 0.828 g. 
When width is fixed, the mean increase in weight associated with a 1-mm increase in length 


is 0.373 g. 


Mean yield = 415.11—6.6(20) — 4.5(40) = 103.11. 
Mean yield = 415.11—6.6(18.9) — 4.5(43) = 96.87. 


When the average percentage of sunshine is fixed, the mean decrease in yield associated with 
a 1-degree increase in average temperature is 6.60. 

When the average temperature is fixed, the mean decrease in yield associated with a one 
percentage point increase in average percentage of sunshine Is 4.50. 


207 
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Be Peer E io (ba Chin ae) 
456 | 526 | 564 | 570 | 544 | 


Mean of y 
600 


450 


400 


i) 
& 
lon 
oo 


10 ile 
a 


b_ The values calculated in Part (a) show us that the chlorine content is greater for a degree of 
delignification value of 10 than for a degree of delignification value of 8. 


¢ When x=9, meanoty=371. 


When degree of delignification increases from 8 to 9, mean chlorine content increases by 7. 
When degree of delignification increases from 9 to 10, mean chlorine content decreases by 1. 


14.11 a 


Mean y 
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Mean y 


0) 5 10 15 20 25 30 
xl 


¢ The fact that there is no interaction between x, and x, is reflected by the fact that in each of 
the graph, the lines are parallel. 


Mean y 
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0 5 10 15 20 25 30 
xl 


The presence of an interaction term causes the lines in the graphs to be nonparallel. 


14:13 as ya + + f,X» + Bx, +e 


b Y= UP % + Bx, + 9%, + B,x; + Bx; + Bx; +e 


¢ y=At+ Bx, + Bx, + [yx; + Byxx; +e 
Y=At Bx, + Lyx, + Bx; + Byxx; +e 
V=At Bx, + Bx, + yx; + Byxx, +e 


d y=a+Bhx,+ fx, + Bx, + Bix; + Bx; + Box5 + B)X.X3 + PgxXX3 + Pyx,x, +e 


14.15 a  Weneed additional variables x,, x,, and x; . The values of these variables could be defined 
as shown in the table. 


| Subcompact | 0 | 0 | 0 | 
| Compact | 1 | 0 | 0 | 
| Midsize_| 0 | 1 | 0 | 
| Large | 0 | 0] 1 | 


The model equation is y=a@+ fx, + fx, + fx, + Bx, + Bx; +e 


b The additional predictors are x,x;, x,x,, and x,x5. 
14.17 a P-value= P(F;,; > 4.23) =0.024. 

b P-value = P(F, \g > 1.95) = 0.146. 

e P-value = P(F; 5) > 4.10) = 0.010. 


d P-value = P(F, 4; > 4.58) = 0.004. 
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1. The model is y=a@+ £,x, + B,x, + B,x;+e where y= surface area, x, = weight, 
X, = width, and x, = length. 
Hy: B, = B, = B, =0 
3. H,: At least one of the £,’s is not zero. 
a=0.05 
2 
fee /k 
(1-R*)/(n-(k +1)) 
6. Since we do not have the original data set we are unable to check the conditions. We need 
to assume that the variables are related according to the model given above, and that the 


random deviations, e, are normally distributed with mean zero and fixed standard 
deviation. 


_ —-0.996/3 
(1—0.996)/146 
P-value = P(F3 14, > 12118) =0 


Since P-value ~0<0.05 we reject Hy. We have convincing evidence that the multiple 
regression model is useful. 


=12118 


Since the P-value is small and 7° is close to 1 there is strong evidence that the model is 
useful. 


The model in Part (b) should be recommended, since adding the variables x, and x, to the 


model (to obtain the model in Part (a)) only increases the value of R? asmall amount (from 
0.994 to 0.996). 


The model is y=a@+ fx, + 8x, + 64x; + 8.x, + Bx; + Box +e, where y = species richness, 
x, = watershed area, x, = shore width, x, = drainage, x, = water color, x; = sand 
percentage, and x, = alkalinity . 
Ay: B, = f, =f; = 8, = Bs = Bs =9 
H,: At least one of the f,’s is not zero. 
a=0.01 
R?/k 

itt = 2 

(1-R’)/(n-(k +1) 
Since we do not have the original data set we are unable to check the conditions. We need to 


assume that the variables are related according to the model given above, and that the random 
deviations, e, are normally distributed with mean zero and fixed standard deviation. 


_ _0.83/6 

~ (1-0.83)/30 
P-value = P( Fe 3) > 24.412) =0 
Since P-value ~ 0< 0.01 we reject Hy. We have convincing evidence that the chosen model 
is useful. 


= 24.412 
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14.23. 1. The model is y=a@+ fx, + x, + Bx; + B,x, +e, where y= fish intake, x, = water 
temperature, x, = number of pumps running, x, = sea state, and x, = speed. 
2. Ho: f, =f, =f, =f, =9 
3. Hy: At least one of the £,’s is not zero. 
4. a=0.1 
7 SSRegr/k 
~ SSResid/(n—(k +1) 
6. Since we do not have the original data set we are unable to check the conditions. We need to 


assume that the variables are related according to the model given above, and that the random 
deviations, e, are normally distributed with mean zero and fixed standard deviation. 


z 1486.9/4 = 3.500 
2230.2/21 

8. P-value = P(F, >, > 3.500) = 0.024 

Since P-value = 0.024 < 0.1 we reject Ho. We have convincing evidence that the model is 

useful. 


14.25 1. The model is y=a@+f,x, + Bx, + Bx; + ByxX4 + Box5 + Boxe + By X7 + Baxe + PoX% +e, where 
y = ecology score, x, = age times 10, x, = income, x,= gender, x, = race, x; = number of 
years of education, x, = ideology, x, = social class, x, = postmaterialist (0 or 1), and x= 
materialist (0 or 1). 


2. Ho: f, =f, =f,=6,=);=8, = 6, = = & =9 
3. H,: At least one of the £,’s is not zero. 

4. a@=0.05 

“ R°/k 


a (1-R*)/(n-(k +1)) 

6. Since we do not have the original data set we are unable to check the conditions. We need to 
assume that the variables are related according to the model given above, and that the random 
deviations, e, are normally distributed with mean zero and fixed standard deviation. 


7. Wehave n=1136 and k=9. So F= 0.06/9 


ee F986, 
(1—0.06)/1126 


8. P-value = P(F 1196 > 7.986) = 0 


Since P-value =~0< 0.05 we reject Ho. We have convincing evidence that the multiple 
regression model is useful. 


14.27 a The MINITAB output is shown below. 


The regression equation is 
Catch Time = 1.44 - 0.0523 Prey Length + 0.00397 Prey Speed 


Predictor Coef SE Coef AR P 
Constant 1.43958 OF 08225 anne oe OOOO 
Prey Length -0.05227 0.01459 -3.58 0.002 


Prey Speed 0.0039700 0.0006194 6.41 0.000 


See UR OI S07 52 R-Sq = 75.0% Resch (aay elo 
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a 


Analysis of Variance 


Source DF ss MS F P 

Regression CeO hoy Or AOS 09es 245028 = On 0100 
SReCsSuCualle Eiacore 6) Osis MOLTO 0S 66 

Total 1s) Os554°78 


The estimated regression equation is jy = 1.43958 —0.05227x, + 0.0039700x, , where y = 
catch time, x, = prey length, and x,= prey speed. 


b When x, =6 and x, =50, »=1.43958 —0.05227(6) + 0.0039700(50) = 1.324 seconds. 


¢ 1. The model is y=a@+,x, + 6.x, +e, with the variables as defined above. 
De Hye 05 = 0 
3. H,: At least one of the £,’s is not zero. 
4, @=0.05 
R?/k 


= GR n—4) 


6. The normal probability plot of the standardized residuals is shown below. 


Standardized Residual 


9) oi 0 1 2 
Normal Score 


There is a linear pattern in the plot, so we are justified in assuming that the random 
deviations are normally distributed. 

7. F=24.02 

P-value = 0.000. 

9, Since P-value = 0.000 < 0.05 we reject Ho. We have convincing evidence that the 
multiple regression model is useful for predicting catch time. 


ca 
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d_ The values of the new variable are shown in the table below. 


Prey 
= oth 


40 


| 80 

60 [1.38 [0.1000_ 
ae ee 
- 80 | 1.50 | 0.0625 | 
ee 


The MINITAB output is shown below. 


The regression equation is 
Catch mes— mili o 9 pau 410 9 3 


Predictor Coef SE Coef ay P 
Constant Tt 58648) OR 04808N 933203308 000 
x -1.4044 Onsi24 5-4-5505 0.000 
Se= 0.122096 R-Sq = 54.3% R-Sq(adj) = 51.6% 


Analysis of Variance 


Source DF ss MS F P 
Regression IOP S OLS Sar SOUS bea. 2 cmon OOO 
ReESicilaly Leo Loney ee Oly 254 2 Oe Oia Oi 

Total 18 0.55478 


The estimated regression equation is » =1.58648—1.4044x . 


e Since both the R° and the adjusted R* values shown in the computer outputs are greater for 
the first model than for the second, the first model is preferable to the second. The first model 
is the one that accounts for the greater proportion of the observed variation in catch time. 
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14.29 a  SSResid = 390.435, SSTo = 1618.209, SSRegr = 1618.209 — 390.435 = 1227.775. 


beeen SSResid ae 390.435 
. SSTo 1618.209 
This tells us that 75.9% of the observed variation in shear strength can be explained by the 


fitted model. 


= 0.759. 


c¢ 1. The modelis y=a+t Bx, + ,x, + Bx; + B,x,+ Bx; +e, where y= shear strength, x, = 
depth, x, = water content, x,=x;, x,=x>, and Xs =X,X>. 
Hy: B, = B, = B; = B, = 8, =0 
. H,: At least one of the £ ’s is not zero. 
4. a@=0.05 
2 
pee RRR 
(1— R*)/(n-(k +1) 


6. The normal probability plot of the standardized residuals is shown below. 


Standardized Residual 


o) = 0 1 2) 
Normal Score 


The plot shows a linear pattern, so we are justified in assuming that the random 
deviations are normally distributed. 
Fe 0.759/5 ee 

0.241/8 
P-value = P(F,, > 5.031) = 0.022 
Since P-value = 0.022 < 0.05 we reject Hy. We have convincing evidence that the 
multiple regression model is useful. 


The model is y=a@+ fx, + x, +e, where y= yield, x,= defoliation level, and x, = My 
Fo: B, = B, =0 

Hy: At least one of the f,’s is not zero. 

C=0.01 


14.31 


PWN 
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R?/k 
P= 
(1-R’)/(n-(k +1) 
6. Since we do not have the original data set we are unable to check the conditions. We need to 


assume that the variables are related according to the model given above, and that the random 
deviations, e, are normally distributed with mean zero and fixed standard deviation. 


_ -0.902/2 

~ (1-0.902)/21 
8. P-value = P(F, 4, > 96.643) = 0 

Since P-value ~0<0.01 we reject Hy. We have convincing evidence that the quadratic 
model specifies a useful relationship between y and x. 


= 96.643 


14.33, The MINITAB output is shown below. 


The regression equation is 


arte Pabst ss acy RP pik hs allen Sy biek ee AO). (Ops ieiloy Moet} a (0) PAS Sh Sattly oh AO UNS Ph es 
Predictor Coef SE Coef TM P 

Constant -151.4 Uae Me lel Be Oe Oe 

ail -16.216 Spats sae tek (0) altoy! 

x2 3sma76 is} Abele Ue fSisy (ie ite) 

xs Oy Kokeese) “0 See! ees Wepre! 

x4 -0.2528 Ooalyah — Sab Sey Wha olishe 

a5 0.4922 0.2281 2 V6me OR O63 

Se= noms 7/35 R-Sq = 75.9% R-Sq(adj) = 60.8% 


Analysis of Variance 


Source DF oS MS F Pp 
Regression Deron see A Deo eer OSmn Om Oe 
Residual Error 8 390.64 48.83 

Total Toae6 18 ae 


This verifies the estimated regression equation given in the question. 


14.35 The MINITAB output is shown below. 


The regression equation is 
Voianooes = 0.68 xierel 28) x2 


Predictor Coef SE Coef 7 P 
Constant Bb mss ayer ak! OF 67m 20.5 510:8 
ofall -0.676 [4S Cues Ole Ole aul 
oe) il Patel 0.4243 SOR 10) (0\0N: 
Sheccl WHEE SiMe) Sen Sis 0? seeciefl (=Kelyi) — esyiealk: 
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ee 


Analysis of Variance 


Source DF ss) MS F P 
Regression 2 20008 10004 18.95 0.000 
Residual Error 31 16369 528 

Total BS BSI 


The estimated regression equation is j = 35.83 — 0.676x, + 1.2811x,, where y = infestation rate, 
x, = mean temperature, and x, = mean relative humidity. 


1. The model is y=a@+ £,x, + fx, +e, with the variables defined as above. 
2. Ho: B, = B, =0 
3. H,: At least one of the £;,’s is not zero. 
4. a@=0.05 
R’|k 
[I 
(1-R*)/(n-(k +1) 


6. The normal probability plot of the standardized residuals is shown below. 


a 


Standardized Residual 


Normal Score 


The plot shows a linear pattern, so we are justified in assuming that the random deviations are 
normally distributed. 

ote L895 

P-value = 0.000 

9, Since P-value = 0.000 < 0.05 we reject Hy. We have convincing evidence that the multiple 
regression model is useful. 


2 
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Chapter 15 
Analysis of Variance 


Note: In this chapter, numerical answers to questions involving the normal, ¢, chi square, and F 
distributions were found using values from a calculator. Students using statistical tables will find that 
their answers differ slightly from those given. 


15.1 a P-value = P(F,,, > 5.37) = 0.007. 
b P-value = P(F, ,; > 1.90) = 0.163. 
c P-value = P(F,,; > 4.89) = 0.010. 
d P-value = P(F, 5) > 14.48) = 0.000. 
e P-value = P(F, 5) > 2.69) = 0.074. 
f P-value = P(Fy 5) > 3.24) = 0.019. 


15.3. a Let &, 44, L;, , be the mean lengths of stay for people participating in the four health 
plans. 
Ao: fy = My = Hy = My 
H,: At least two among /4,, /t,, 1, /4, are different. 


b df, =k-1=3, df, =N-k =32-4=28. 
P-value = P(F, 5, > 4.37) = 0.012. 


Since P-value = 0.012 > 0.01 we do not reject Hp. We do not have convincing evidence that 
mean length of stay is related to health plan. 


e df, =k-1=3, df, =N—-k=32-4=28. 
P-value = P(F, 5, > 4.37) = 0.012. 
Since P-value = 0.012 >0.01 we do not reject Hp. We do not have convincing evidence that 
mean length of stay is related to health plan. 


15.5 Summary statistics are given in the table below. 


1. Let 44, /b, 4s, /, be the mean ratings for the four restrictive rating labels. 
2. Ao: (hb ='h =1 = 
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3. H,: At least two among 44, 4b, 44, My, are different. 


Ay -O= 0,05 
5 Fz MSTr 
MSE 


6. Boxplots for the four groups are shown below. 


HH, 


12+ label —K@M ia 
16+ label cearemmemmnmer | (5/000 Tl rnmer 
18+ label LL 
De eee ee eee ee 
0 2) 4 6 8 10 
Rating 


The boxplots are close enough to being symmetrical, and there are no outliers. The largest 
standard deviation (2.098) is not more than twice the smallest (1.449). We are told to assume 
that the boys were randomly assigned to the four age label ratings. 

7. N=10+10+10+10=40 


Grand total = 10(4.8) + 10(6.8) + 10(7.1) + 10(8.1) = 268 


SSTr =n, (x, —X)° +,(X) —X)° +1;(X; —x) +n,(%, -x) 
=10(4.8—6.7)° +10(6.8—6.7)? +10(7.1—6.7)* +10(8.1—6.7)° 


ett ie 
Treatment df =k —1=3 


SSE =(n, —1)s; +(n, -1)s; +(n; —1)s; + (1, -1)8; 
= 9(4.4) + 9(2.622) + 9(2.322) + 9(2.1) 
10% 

Error df = N-k =40-4 =36 

pa MSTr _ SSTr/treatment df _ 57.4/3 _ 
MSE SSE/error df 103/36 

P-value = P(F; 3, > 6.687) = 0.001 


Since P-value =0.001< 0.05 we reject Hp. We have convincing evidence that the mean 
ratings for the four restrictive rating labels are not all equal. 
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Summary statistics are shown in the table below. 


Oa rade age eon! 


Treatment 1 
4.081 3.44] 2.401 


Let /4,, 4, 4, Ll, be the mean numbers of pretzels consumed for the four treatments. 
Ao: fy = fy = fy = My 
A1,: At least two among 4, /4, Lz, {, are different. 


ES fo s | & 


O=0,05 
Fe MSTr 
MSE 


Boxplots for the four groups are shown below. 


Treatment 2 $f» 
Treatment 3 YM} ——— 
Treatment 4 ——WUW040A 


0 2 4 6 8 10 12 14 
Number of Pretzels Consumed 


The boxplots are roughly symmetric, and there are no outliers. The largest standard deviation 
(4.081) is not more than twice the smallest (2.129). We are told that the men were randomly 
assigned to the four treatments. 

N=n, +n, +n, +n, =74 

Grand total = 7,X, + 1,X, +1,X, +1,X, =367 


grand total _ 4.959 


TS 


SSTr =n, (%, —¥)° +n,(% —¥) +n,(%,—X) +n, (%,-—X) 
= 162.363 
Treatment df =k-1=3 
SSE =(n, -1)s7 +(n, —1)s5 +(n; — 1s; + (m4 — Ds; 
= 718.515 
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Error df = N —k =70 
_ MSTr _ SSTr/treatment df 
~ MSE SSB/error df 
P-value = P(F; 7) > 5.273) = 0.002 
Since P-value = 0.002 < 0.05 we reject Hy. We have convincing evidence that the mean 
numbers of pretzels consumed for the four treatments are not all equal. 


= 5.273 


Let 4), 4b, 4, My be the mean changes in body fat mass for the four treatments. 


Hy: Ly = fy = 1s = My 
H,: At least two among 4, 4b, 44, , are different. 


a@=0.05 
Fe MSTr 
MSE 


Boxplots for the four groups are shown below. 


G+ —— WI —— 


Change in body fat mass 


The boxplots are roughly symmetric, and there are no outliers. The largest standard deviation 
(1.443) is not more than twice the smallest (1.122). We are told that the men were randomly 
assigned to the four treatments. 


N=74 
Grand total = —158.3 
=. grand total =_) 139 


SSTr =n,(x, -—X¥) +n, (x, —¥Y +n,(%, —X) +n, (%, -— XY 
= 247.403 
Treatment df =k-—1=3 
SSE =(n, -1)s; +(n,— l)s5 +(n, —1)s3 +(n, —1)s3 
= 107.314 
Error df = N—k =70 
py —MSTr _ SSTr/treatment df 
MSE SSE/error df 
P-value = P(F, 4) > 53.793) = 0 
Since P-value ~0< 0.05 we reject Hy. We have convincing evidence that the mean change in 
body fat differs for the four treatments. 


= 53.793 
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TS.11 Let £4, 4, Ht, be the mean Hopkins scores for the three populations. 


] 

2. Ho: fy = Lh = Ls, 

3.. H,: At least two among 4, fl), 4 are different. 
4. a@=0.05 
5 
6 


Fe MSTr 
MSE 


We are told to treat the samples as random samples from their respective populations. We 
have to assume that the population Hopkins score distributions are approximately normal 
with the same standard deviation. 


Dog Viger iy +N, = 234 
Grand total = n,x, +n,xX, +n,X; = 7064.66 


grand total 


X= = 30.191 


SS Cea =x) (tp) +t na(Xeeer 
= 100.786 

Treatment df =k -1=2 

SSE =(n, —1)s; + (n, —1)s; +(n, —1)s} +(n, -1)s3 
= 4409.036 

Error di = N —k=231 

fp MSTr _ SSTr/treatment df 
MSE SSE/error df 

8. P-value = P(F, 53, > 2.640) = 0.074 
Since P-value = 0.074 >0.05 we do not reject Hp. We do not have convincing evidence that 
the mean Hopkins scores are not the same for all three student populations. 


= 2.640 


15.13. k=4, N =20. Treatment df =k —1=3. Errordf =N-—k=16. 
SSTr = SSTo — SSE = 310500.76 — 235419.04 = 75081.72. 
The completed table is shown below. 


Source of Sum of Mean 
Variation df | Squares Square Te 
75081.72 | 25027.24 | 1.701 


[Error __| 16 | 235419.04 | 14713.69 |__| 
IRC te | 1112310500157 60ers 


Let 4, /b, 1, Hy be the mean number of miles until failure for the four given brands of 


— 


spark plug. 
2. Ao: My = fh = 1h = My 
3. Hy; At least two among £4, 4b, 4, Hy, are different. 
4. @=0.05 
5. = ES MSTr 
MSE 


6. We need to treat the samples as random samples from their respective populations, and 
assume that the population distributions are approximately normal with the same standard 
deviation. 
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8. 


9: 


Fee 0l 

P-value = P(F,,, > 1.701) = 0.207 

Since P-value = 0.207 > 0.05 we do not reject Hp. We do not have convincing evidence that 
the mean number of miles to failure is not the same for all four brands of spark plug. 


15.15 Since there is a significant difference in all three of the pairs we need a set of intervals none of 
which includes zero. Set 3 is therefore the required set. 


1 Bees Fy 


15.19 


15.21 


a 


a 


a 


In decreasing order of the resulting mean numbers of pretzels eaten the treatments were: 
slides with related text, slides with no text, slides with unrelated text, and no slides. There 
were no significant differences between the results for slides with no text and slides with 
unrated text, and for slides with unrelated text and no slides. However there was a significant 
difference between the results for slides with related text and each one of the other 
treatments, and between the results for no slides and for slides with no text (and for slides 
with related text). 


The results for the women and men are almost exactly the reverse of one another, with, for 
example, slides with related text (treatment 2) resulting in the smallest mean number of 
pretzels eaten for the women and the largest mean number of pretzels eaten for the men. For 
the men, treatment 2 was significantly different from all the other treatments; however for 
women treatment 2 was not significantly different from treatment |. For both women and 
men there was a significant difference between treatments | and 4 and no significant 
difference between treatments 3 and 4. However, between treatments | and 3 there was a 
significant difference for the women but no significant difference for the men. 


Driving Shooting Fighting 
Sample mean 3.42 4.00 aU 
Driving Shooting Fighting 


Sample mean 2.81 3.44 4.01 


N =n, +n, +n, +n, =80 
Grand total = n,x, +,X, +X, +n,X, =158 


grand total _ 1.975 


R= 


SSTr = n,(%, -— xX) +n, (X, —X) +1,(%, —x)° +n, (X, -—¥Y 
= 13.450 

Treatment df =k-1=3 

SSE =(n, -1)s; +(n, —1)s5 + (n, -1)s3 +(n, —1)s7 
= 7.465 

Error df = N-k=76 

Fz MSTr _ SSTr/treatment df 
MSE SSE/error df 

The ANOVA table is shown below. 


= 45.644 
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1. Let 4, Lb, 44, 4, be the mean numbers of seeds germinating for the four treatments. 
2. Ho: fy = [bh = Hy = My 
3. H,: At least two among “4, 44, 43, Md, are different. 
4. a@=0.05 
5 Fa MSTr 
MSE 
6. 


We need to assume that the samples of 100 seeds collected from each treatment were 
random samples from those populations, and that the population distributions of numbers 
of seeds germinating are approximately normal with the same standard deviation. 

7. F=45.644 

8. P-value = P(F; 7, > 45.644) = 0 


Since P-value ~0<0.05 we reject Ho. We have convincing evidence that the mean 
number of seeds germinating is not the same for all four treatments. 


b We will construct the T-K interval for 4 — 4. 
Appendix Table 7 gives the 95% Studentized range critical value g=3.74 (using k =4 and 
error df = 60, the closest tabled value to df =n—k =76). The T-K interval for 42, — 4, is 


(2.35 —1.70)+3.74 Se + a =( 07388, 0.912): 

2 20520 
Since this interval does not contain zero, we have convincing evidence that seeds eaten and 
then excreted by lizards germinate at a different rate from those eaten and then excreted by 
birds. Therefore, since the sample mean was higher for the lizard dung treatment than for the 
bird dung treatment, we have convincing evidence that seeds eaten and then excreted by 
lizards germinate at a higher rate from those eaten and then excreted by birds. 
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